Recombinant phage and methods

ABSTRACT

This disclosure provided methods of cloning a phage genome. Also provided are methods of making a recombinant phage genome. In some embodiments the phage genome is engineered to comprise a heterologous nucleic acid sequence, for example a sequence comprising an open reading frame. In some embodiments the phage genome is cloned in a yeast artificial chromosome. Recombinant phage genomes and recombinant phage are also provided. In some embodiments the methods are high throughput methods such as methods of making aa plurality of recombinant phage genomes or recombinant phage. Collections of recombinant phage genomes and recombinant phage are also provided.

RELATED APPLICATIONS

This is a continuation of U.S. patent application Ser. No. 13/627,060,filed on Sep. 26, 2012, allowed, which in turn claims priority to U.S.Provisional Patent Application No. 61/539,454, filed Sep. 26, 2011; U.S.Provisional Patent Application No. 61/549,743, filed Oct. 20, 2011; andU.S. Provisional Patent Application No. 61/642,691, filed May 4, 2012.The entire contents of each of those applications are herebyincorporated herein by reference.

INCORPORATION BY REFERENCE OF SEQUENCE LISTING

The contents of the text file named “SAM6-006_C01US320350-2055_ST25.txt”, which was created on Jan. 7, 2016 and is 12 KB insize, are hereby incorporated by reference in their entireties.

INTRODUCTION

Model phage have been engineered using molecular biology techniques todeliver heterologous protein products to bacterial cells. For example,phage have been engineered to deliver enzymes to biofilms to digest theextracellular matrix and destroy the biofilm. (E.g., U.S. PatentApplication Publication No. 2009/0155215.) Phage have also beenengineered to express protein products that can be visualized in orderto detect the presence of a particular type of bacterial cell that issusceptible to infection by the phage. (E.g., “Construction ofLuciferase Reporter Bacteriophage A511::luxAB for Rapid and SensitiveDetection of Viable Listeria Cells,” M. J. Loessner et. al., Applied andEnvironmental Microbiology, Vol. 62, No. 4, pp. 1133-40 (1996).) Thenatural host range of the phage engineered to date is a limitation,however, and those phage don't infect many relevant bacteria andbiofilms.

Methods of engineering additional phage, with more varied host range,will contribute to expansion of the use of phage engineering technology.High throughput methods of creating variations in phage genomes andengineered phage genomes will also contribute to identification of phagewith varied properties that are useful for diagnostic and therapeuticpurposes. To date such methods have in general been lacking, however,and therefore additional methods of engineering phage will be useful.

Engineering diverse phage is generally made more difficult by theproperties of phage genomes. For example, phage genomes have relativelyfew restriction sites and are heavily modified, making use oftraditional cloning techniques with phage challenging. Phages also havecompact genomes with very little non-coding DNA, which can make itchallenging to find sites within the genome that are compatible withtraditional engineering.

One approach for cloning phage DNA relies on isolating phage DNA,cutting the DNA with restriction enzymes, and transforming the DNA backinto the host for recombination into viable phage. A second approach isto clone a part of a phage genome in a plasmid, engineer in aheterologous sequence and transfer that heterologous sequence into arelevant host strain. These cells can be infected with wild-type phages,allowing for homologous recombination between the phage and theheterologous sequence. Screening for recombinant phages will reveal theengineered phages. These techniques have succeeded in isolatedinstances. (E.g., “Construction of Luciferase Reporter BacteriophageA511::luxAB for Rapid and Sensitive Detection of Viable Listeria Cells,”M. J. Loessner et. al., Applied and Environmental Microbiology, Vol. 62,No. 4, pp. 1133-40 (1996).) However, the process must be completedbefore any engineered phage can be tested. The whole process must berepeated, end-to-end, for any new insertion site within a particularphage. If a site is not viable, the entire process must be repeated forthe next insertion site.

The inventors sought to develop more useful methods of cloning phage DNAand creating genetically engineered phage by using transformationassociated recombination techniques to clone whole phage genomes. Thistechnique is described in N., Larionov, V., October 2006. TAR cloning:insights into gene function, long-range haplotypes and genome structureand evolution 7 (10), 805-812. In experiments with Lambda phage, no morethan 83% of the total Lambda genome was verified. With this result, theprocess was deemed unsuitable for many uses.

The inventors have since surprisingly found that phage genomes are notlethal in yeast cells and thus that phage can be cloned into suitablevectors and propagated in yeast. The inventors have exploited thisfinding to develop recombinant vectors comprising phage genomes. In someembodiments the phage genome is engineered to comprise a heterologousnucleic acid sequence, for example a sequence comprising an open readingframe. The vectors are useful, for example, to make genetically modifiedphage. Also provided are methods of cloning a phage genome. Alsoprovided are methods of making a recombinant phage genome. In someembodiments the phage genome is engineered to comprise a heterologousnucleic acid sequence, for example a sequence comprising an open readingframe. Recombinant phage genomes and recombinant phage are alsoprovided. In some embodiments the methods are high throughput methodssuch as methods of making a plurality of recombinant phage genomes orrecombinant phage. Collections of recombinant phage genomes andrecombinant phage are also provided. These and other aspects of thedisclosure are described more fully herein.

The methods and recombinant vectors, phage genomes, and phage providedherein are a major advancement over current phage engineeringtechnologies which rely on in vitro strategies, which are generallyinefficient and challenging to scale up, or on engineering phages withinbacteria, which is generally problematic due to toxicity of phages tobacteria and the difficulty in maintaining the stability of largeengineered genomes.

SUMMARY

In a first aspect, methods of making a cloned phage genome are provided.In some embodiments the methods comprise providing a vector, inserting astarting phage genome into the vector to provide a recombinant vector,and propagating the recombinant vector in a vector host cell that is nota phage host cell to thereby provide the cloned phage genome. In someembodiments of the methods the recombinant vector comprising a startingphage genome is made by a method comprising co-transforming the startingphage genome and the vector into a plurality of vector host cells, underconditions that allow insertion of the starting phage genome into thevector, and selecting a vector host cell comprising the recombinantvector as a result of insertion of the starting phage genome into thevector. In some embodiments of the methods the recombinant vectorcomprising a starting phage genome is made by a method comprisingtransforming the starting phage genome into a plurality of vector hostcells comprising the vector, under conditions that allow insertion ofthe starting phage genome into the vector, and selecting a vector hostcell comprising the recombinant vector as a result of insertion of thestarting phage genome into the vector. In some embodiments the methodsfurther comprise isolating the recombinant vector. In some embodimentsthe methods further comprise removing the cloned phage genome from therecombinant vector. In some embodiments the cloned phage genome isremoved from the recombinant vector by a method comprising transformingthe recombinant vector comprising the cloned phage genome into competentphage host cells, and culturing the phage host cells under conditionssufficient for production of phage particles comprising the cloned phagegenome. In some embodiments the methods further comprise isolating thecloned phage genome. In some embodiments the vector is a yeastartificial chromosome and the vector host cell is a yeast cell.

In a second aspect methods of making a recombinant phage genome are alsoprovided. In some embodiments the methods comprise providing vector hostcells comprising a recombinant vector comprising a cloned phage genome,inserting a heterologous nucleic acid sequence into the starting phagegenome to provide a recombinant phage genome, and selecting vector hostcells comprising the recombinant vector comprising the recombinant phagegenome to thereby provide the cloned phage genome. In some embodimentsof the methods the recombinant vector comprising a cloned phage genomeis made by a method comprising providing a vector, inserting a startingphage genome into the vector to provide the recombinant vector, andpropagating the recombinant vector in a vector host cell that is not aphage host cell to thereby provide the recombinant vector comprising acloned phage genome. In some embodiments of the methods the recombinantvector comprising a starting phage genome is made by a method comprisingco-transforming the starting phage genome and the vector into aplurality of vector host cells, under conditions that allow insertion ofthe starting phage genome into the vector, and selecting a vector hostcell comprising the recombinant vector as a result of insertion of thestarting phage genome into the vector. In some embodiments of themethods the recombinant vector comprising a starting phage genome ismade by a method comprising transforming the starting phage genome intoa plurality of vector host cells comprising the vector, under conditionsthat allow insertion of the starting phage genome into the vector, andselecting a vector host cell comprising the recombinant vector as aresult of insertion of the starting phage genome into the vector. Insome embodiments the methods further comprise isolating the recombinantvector comprising the recombinant phage genome. In some embodiments themethods further comprise removing the recombinant phage genome from therecombinant vector. In some embodiments the recombinant phage genome isremoved from the recombinant vector by a method comprising, transformingthe recombinant vector comprising the recombinant phage genome intocompetent phage host cells, and culturing the phage host cells underconditions sufficient for production of phage particles comprising therecombinant phage genome. In some embodiments the methods furthercomprise isolating the recombinant phage genome. In some embodiments thevector is a yeast artificial chromosome and the vector host cell is ayeast cell.

In a third aspect additional methods of making a recombinant phagegenome are provided. In some embodiments the methods comprise providinga yeast artificial chromosome comprising a cloned phage genome, andinserting a heterologous nucleic acid sequence into the cloned phagegenome to provide a recombinant phage genome. In some embodiments theheterologous nucleic acid sequence is inserted into the phage genome invivo. In some embodiments the heterologous nucleic acid sequence isinserted into the phage genome in vitro. In some embodiments the methodsfurther comprise removing the recombinant phage genome from the yeastartificial chromosome. In some embodiments the recombinant phage genomeis removed from the yeast artificial chromosome by a method comprisingtransforming the yeast artificial chromosome into competent phage hostcells, and selecting a recombinant phage genome that yields phageparticles comprising the phage genome from transformed phage host cells.

In some embodiments of the second and third aspects, the heterologousnucleic acid sequence comprises 3.1 kilobases. In some embodiments theheterologous nucleic acid sequence comprises an open reading frame. Insome embodiments the open reading frame encodes a marker that confers atleast one phenotype selected from a selectable phenotype and ascreenable phenotype on a vector host cell comprising the vector. Insome embodiments the open reading frame encodes a marker that confers atleast one phenotype selected from a selectable phenotype and ascreenable phenotype on a phage host cell comprising the phage genome.In some embodiments the heterologous nucleic acid sequence comprises asecond open reading frame. In some embodiments the open reading frame isoperatively linked to an expression control sequence that directsexpression of the open reading frame in at least one of a vector hostcell and a phage host cell. In some embodiments the expression controlsequence is endogenous to the phage genome. In some embodiments theexpression control sequence is located within the heterologous nucleicacid sequence.

In some embodiments of the first, second, and third aspects, the methodscomprise analyzing the sequence of the starting phage genome.

In some embodiments of the first, second, and third aspects, the methodsdo not comprise analyzing the sequence of the starting phage genome.

In a fourth aspect methods of making a phage are provided. In someembodiments a cloned and/or recombinant phage genome made by a method ofthis disclosure is transformed into a phage host cell and phageparticles comprising the phage genome produced by the transformed phagehost cells are isolated. In some embodiments the methods compriseproviding a yeast artificial chromosome comprising a phage genome,transforming the yeast artificial chromosome into competent phage hostcells, and isolating phage particles comprising the phage genomeproduced by the transformed phage host cells. In some embodiments thephage genome is recombinant.

In a fifth aspect a cloned phage genome made by a method of thisdisclosure is also provided.

In a sixth aspect a recombinant phage genome made by a method of thisdisclosure is also provided.

In a seventh aspect a phage comprising a genome made by a method of thisdisclosure is also provided.

In an eighth aspect a YAC comprising a cloned phage genome is alsoprovided. In some embodiments the cloned phage genome is a recombinantphage genome comprising a heterologous nucleic acid sequence. In someembodiments the heterologous nucleic acid sequence is inserted into thecloned phage genome without deletion of endogenous phage genomicsequence. In some embodiments the heterologous nucleic acid sequence isinserted into the cloned phage genome and endogenous phage genomicsequence is deleted at the site of insertion. In some embodiments theheterologous nucleic acid sequence comprises 3.1 kilobases. In someembodiments the heterologous nucleic acid sequence comprises an openreading frame. In some embodiments the open reading frame encodes amarker that confers at least one phenotype selected from a selectablephenotype and a screenable phenotype on a vector host cell comprisingthe vector. In some embodiments the open reading frame encodes a markerthat confers at least one phenotype selected from a selectable phenotypeand a screenable phenotype on a phage host cell comprising the phagegenome. In some embodiments the heterologous nucleic acid sequencecomprises a second open reading frame. In some embodiments the openreading frame is operatively linked to an expression control sequencethat directs expression of the open reading frame in at least one of avector host cell and a phage host cell. In some embodiments theexpression control sequence is endogenous to the phage genome. In someembodiments the expression control sequence is located within theheterologous nucleic acid sequence.

In a ninth aspect a vector host cell comprising a recombinant vectoraccording to this disclosure is provided. In some embodiments the vectorhost cell is a yeast cell and the recombinant vector is a YAC.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an overview of a phage engineering platform that comprisesthe steps of extracting a starting phage genome, capturing the phagegenome into a yeast artificial chromosome (YAC) to yield a YAC-phage,insertion of a heterologous cassette into the captured phage genome, andtransformation of the engineered YAC-phage into a phage host cellcapable of yielding phage particles comprising the engineered phagegenome.

FIG. 2 shows a general strategy that may be used to capture a phagegenome in a YAC vector. Stitching oligonucleotides that span the ends ofthe phage genome and sequences in the YAC are used to promoterecombination between the phage genome and the YAC.

FIGS. 3A to 3E show an example of a cassette construct for insertion ofthe luciferase and Ura3 open reading frames into the T3 genome toreplace either the 0.7 open reading frame or the 4.3 open reading frame(the alternative targets are both represented in the figure). FIG. 3Ashows the cassette structure and oliogonucleotides that may be used toamplify the Luc gene, the Ura3 gene, and the truncated Luc* gene. FIG.3B shows recombination events that those fragments will undergo with thecloned T3 phage genome when introduced into a yeast cell comprising aYAC that comprises the cloned genome. FIG. 3C shows the resulting phagegenome structure following recombination. Note that the recombinedgenome initially comprises the Luc gene, the Ura3 gene, and thetruncated Luc* gene. As shown in FIG. 3D, if selection for Ura3 (whichacts as a selectable marker in yeast grown in the absence of uracil) isremoved then recombination between the homologous sequences in the Lucgene and the truncated Luc* gene (represented by arrows in the figure)will occur and can be selected for using counter selection with 5-FOA(FIG. 3E).

FIG. 4 shows the relative luminescence units generated when a fixedamount of engineered T3 phage comprising heterologous luciferase ornanoluc open reading frames was used to infect E. coli NEB10 cells.

DETAILED DESCRIPTION

Unless otherwise defined herein, scientific and technical terms used inconnection with the present disclosure shall have the meanings that arecommonly understood by those of ordinary skill in the art. Further,unless otherwise required by context, singular terms shall include theplural and plural terms shall include the singular. Generally,nomenclatures used in connection with, and techniques of, biochemistry,enzymology, molecular and cellular biology, microbiology, genetics andprotein and nucleic acid chemistry and hybridization described hereinare those well-known and commonly used in the art. Certain referencesand other documents cited herein are expressly incorporated herein byreference. Additionally, all UniProt/SwissProt records cited herein arehereby incorporated herein by reference. In case of conflict, thepresent specification, including definitions, will control. Thematerials, methods, and examples are illustrative only and not intendedto be limiting.

The methods and techniques of the present disclosure are generallyperformed according to conventional methods well known in the art and asdescribed in various general and more specific references that are citedand discussed throughout the present specification unless otherwiseindicated. See, e.g., Clokie et al., Bacteriophages: Methods andProtocols, Vols. 1 and 2 (Methods in Molecular Biology, Vols. 501 and502), Humana Press, New York, N.Y. (2009); Sambrook et al., MolecularCloning: A Laboratory Manual, 3d ed., Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y. (2001); Ausubel et al, Current Protocolsin Molecular Biology, Greene Publishing Associates (1992, andSupplements to 2002); Harlow and Lane, Antibodies: A Laboratory Manual,Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1990);Taylor and Drickamer, Introduction to Glycobiology, Oxford Univ. Press(2003); Worthington Enzyme Manual, Worthington Biochemical Corp.,Freehold, N.J.; Handbook of Biochemistry: Section A Proteins, Vol I, CRCPress (1976); Handbook of Biochemistry: Section A Proteins, Vol II, CRCPress (1976); Essentials of Glycobiology, Cold Spring Harbor LaboratoryPress (1999).

This disclosure refers to sequence database entries (e.g.,UniProt/SwissProt or GENBANK records) for certain protein and genesequences that are published on the internet, as well as otherinformation on the internet. The skilled artisan understands thatinformation on the internet, including sequence database entries, isupdated from time to time and that, for example, the reference numberused to refer to a particular sequence can change. Where reference ismade to a public database of sequence information or other informationon the internet, it is understood that such changes can occur andparticular embodiments of information on the internet can come and go.Because the skilled artisan can find equivalent information by searchingon the internet, a reference to an internet web page address or asequence database entry evidences the availability and publicdissemination of the information in question.

Before the present vectors, genomes, cells, phage, compositions,methods, and other embodiments are disclosed and described, it is to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments only and is not intended to belimiting. It must be noted that, as used in the specification and theappended claims, the singular forms “a,” “an” and “the” include pluralreferents unless the context clearly dictates otherwise.

The term “comprising” as used herein is synonymous with “including” or“containing,” and is inclusive or open-ended and does not excludeadditional, unrecited members, elements or method steps.

As used herein, the term “in vitro” refers to events that occur in anartificial environment, e.g., in a test tube or reaction vessel, in cellculture, in a Petri dish, etc., rather than within an organism (e.g.,animal, plant, or microbe).

As used herein, the term “in vivo” refers to events that occur within anorganism (e.g., animal, plant, or microbe).

As used herein, the term “isolated” refers to a substance or entity thathas been (1) separated from at least some of the components with whichit was associated when initially produced (whether in nature or in anexperimental setting), and/or (2) produced, prepared, and/ormanufactured by the hand of man. Isolated substances and/or entities maybe separated from at least about 10%, about 20%, about 30%, about 40%,about 50%, about 60%, about 70%, about 80%, about 90%, or more of theother components with which they were initially associated. In someembodiments, isolated agents are more than about 80%, about 85%, about90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%,about 97%, about 98%, about 99%, or more than about 99% pure. As usedherein, a substance is “pure” if it is substantially free of othercomponents.

The term “peptide” as used herein refers to a short polypeptide, e.g.,one that typically contains less than about 50 amino acids and moretypically less than about 30 amino acids. The term as used hereinencompasses analogs and mimetics that mimic structural and thusbiological function.

The term “polypeptide” encompasses both naturally-occurring andnon-naturally occurring proteins, and fragments, mutants, derivativesand analogs thereof. A polypeptide may be monomeric or polymeric.Further, a polypeptide may comprise a number of different domains eachof which has one or more distinct activities. For the avoidance ofdoubt, a “polypeptide” may be any length greater two amino acids.

The term “isolated protein” or “isolated polypeptide” is a protein orpolypeptide that by virtue of its origin or source of derivation (1) isnot associated with naturally associated components that accompany it inits native state, (2) exists in a purity not found in nature, wherepurity can be adjudged with respect to the presence of other cellularmaterial (e.g., is free of other proteins from the same species) (3) isexpressed by a cell from a different species, or (4) does not occur innature (e.g., it is a fragment of a polypeptide found in nature or itincludes amino acid analogs or derivatives not found in nature orlinkages other than standard peptide bonds). Thus, a polypeptide that ischemically synthesized or synthesized in a cellular system differentfrom the cell from which it naturally originates will be “isolated” fromits naturally associated components. A polypeptide or protein may alsobe rendered substantially free of naturally associated components byisolation, using protein purification techniques well known in the art.As thus defined, “isolated” does not necessarily require that theprotein, polypeptide, peptide or oligopeptide so described has beenphysically removed from a cell in which it was synthesized.

The term “polypeptide fragment” as used herein refers to a polypeptidethat has a deletion, e.g., an amino-terminal and/or carboxy-terminaldeletion compared to a full-length polypeptide, such as a naturallyoccurring protein. In an embodiment, the polypeptide fragment is acontiguous sequence in which the amino acid sequence of the fragment isidentical to the corresponding positions in the naturally-occurringsequence. Fragments typically are at least 5, 6, 7, 8, 9 or 10 aminoacids long, or at least 12, 14, 16 or 18 amino acids long, or at least20 amino acids long, or at least 25, 30, 35, 40 or 45, amino acids, orat least 50 or 60 amino acids long, or at least 70 amino acids long.

The term “fusion protein” refers to a polypeptide comprising apolypeptide or fragment coupled to heterologous amino acid sequences.Fusion proteins are useful because they can be constructed to containtwo or more desired functional elements that can be from two or moredifferent proteins. A fusion protein comprises at least 10 contiguousamino acids from a polypeptide of interest, or at least 20 or 30 aminoacids, or at least 40, 50 or 60 amino acids, or at least 75, 100 or 125amino acids. The heterologous polypeptide included within the fusionprotein is usually at least 6 amino acids in length, or at least 8 aminoacids in length, or at least 15, 20, or 25 amino acids in length.Fusions that include larger polypeptides, such as an IgG Fc region, andeven entire proteins, such as the green fluorescent protein (“GFP”)chromophore-containing proteins, have particular utility. Fusionproteins can be produced recombinantly by constructing a nucleic acidsequence which encodes the polypeptide or a fragment thereof in framewith a nucleic acid sequence encoding a different protein or peptide andthen expressing the fusion protein. Alternatively, a fusion protein canbe produced chemically by crosslinking the polypeptide or a fragmentthereof to another protein.

As used herein, a protein has “homology” or is “homologous” to a secondprotein if the nucleic acid sequence that encodes the protein has asimilar sequence to the nucleic acid sequence that encodes the secondprotein. Alternatively, a protein has homology to a second protein ifthe two proteins have similar amino acid sequences. (Thus, the term“homologous proteins” is defined to mean that the two proteins havesimilar amino acid sequences.) As used herein, homology between tworegions of amino acid sequence (especially with respect to predictedstructural similarities) is interpreted as implying similarity infunction.

When “homologous” is used in reference to proteins or peptides, it isrecognized that residue positions that are not identical often differ byconservative amino acid substitutions. A “conservative amino acidsubstitution” is one in which an amino acid residue is substituted byanother amino acid residue having a side chain (R group) with similarchemical properties (e.g., charge or hydrophobicity). In general, aconservative amino acid substitution will not substantially change thefunctional properties of a protein. In cases where two or more aminoacid sequences differ from each other by conservative substitutions, thepercent sequence identity or degree of homology may be adjusted upwardsto correct for the conservative nature of the substitution. Means formaking this adjustment are well known to those of skill in the art. See,e.g., Pearson, 1994, Methods Mol. Biol. 24:307-31 and 25:365-89.

The following six groups each contain amino acids that are conservativesubstitutions for one another: 1) Serine, Threonine; 2) Aspartic Acid,Glutamic Acid; 3) Asparagine, Glutamine; 4) Arginine, Lysine; 5)Isoleucine, Leucine, Methionine, Alanine, Valine, and 6) Phenylalanine,Tyrosine, Tryptophan.

Sequence homology for polypeptides, which is also referred to as percentsequence identity, is typically measured using sequence analysissoftware. See, e.g., the Sequence Analysis Software Package of theGenetics Computer Group (GCG), University of Wisconsin BiotechnologyCenter, 910 University Avenue, Madison, Wis. 53705. Protein analysissoftware matches similar sequences using a measure of homology assignedto various substitutions, deletions and other modifications, includingconservative amino acid substitutions. For instance, GCG containsprograms such as “Gap” and “Bestfit” which can be used with defaultparameters to determine sequence homology or sequence identity betweenclosely related polypeptides, such as homologous polypeptides fromdifferent species of organisms or between a wild-type protein and amutein thereof. See, e.g., GCG Version 6.1.

An exemplary algorithm when comparing a particular polypeptide sequenceto a database containing a large number of sequences from differentorganisms is the computer program BLAST (Altschul et al., J. Mol. Biol.215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993);Madden et al., Meth. Enzymol. 266:131-141 (1996); Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res.7:649-656 (1997)), especially blastp or tblastn (Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997)).

Exemplary parameters for BLASTp are: Expectation value: 10 (default);Filter: seg (default); Cost to open a gap: 11 (default); Cost to extenda gap: 1 (default); Max. alignments: 100 (default); Word size: 11(default); No. of descriptions: 100 (default); Penalty Matrix:BLOWSUM62. The length of polypeptide sequences compared for homologywill generally be at least about 16 amino acid residues, or at leastabout 20 residues, or at least about 24 residues, or at least about 28residues, or more than about 35 residues. When searching a databasecontaining sequences from a large number of different organisms, it maybe useful to compare amino acid sequences. Database searching usingamino acid sequences can be measured by algorithms other than blastpknown in the art. For instance, polypeptide sequences can be comparedusing FASTA, a program in GCG Version 6.1. FASTA provides alignments andpercent sequence identity of the regions of the best overlap between thequery and search sequences. Pearson, Methods Enzymol. 183:63-98 (1990).For example, percent sequence identity between amino acid sequences canbe determined using FASTA with its default parameters (a word size of 2and the PAM250 scoring matrix), as provided in GCG Version 6.1, hereinincorporated by reference.

In some embodiments, polymeric molecules (e.g., a polypeptide sequenceor nucleic acid sequence) are considered to be “homologous” to oneanother if their sequences are at least 25%, at least 30%, at least 35%,at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 90%, at least 95%, or at least 99% identical. In some embodiments,polymeric molecules are considered to be “homologous” to one another iftheir sequences are at least 25%, at least 30%, at least 35%, at least40%, at least 45%, at least 50%, at least 55%, at least 60%, at least65%, at least 70%, at least 75%, at least 80%, at least 85%, at least90%, at least 95%, or at least 99% similar. The term “homologous”necessarily refers to a comparison between at least two sequences(nucleotides sequences or amino acid sequences). In some embodiments,two nucleotide sequences are considered to be homologous if thepolypeptides they encode are at least about 50% identical, at leastabout 60% identical, at least about 70% identical, at least about 80%identical, or at least about 90% identical for at least one stretch ofat least about 20 amino acids. In some embodiments, homologousnucleotide sequences are characterized by the ability to encode astretch of at least 4-5 uniquely specified amino acids. Both theidentity and the approximate spacing of these amino acids relative toone another must be considered for nucleotide sequences to be consideredhomologous. In some embodiments of nucleotide sequences less than 60nucleotides in length, homology is determined by the ability to encode astretch of at least 4-5 uniquely specified amino acids. In someembodiments, two protein sequences are considered to be homologous ifthe proteins are at least about 50% identical, at least about 60%identical, at least about 70% identical, at least about 80% identical,or at least about 90% identical for at least one stretch of at leastabout 20 amino acids.

As used herein, a “modified derivative” refers to polypeptides orfragments thereof that are substantially homologous in primarystructural sequence to a reference polypeptide sequence but whichinclude, e.g., in vivo or in vitro chemical and biochemicalmodifications or which incorporate amino acids that are not found in thereference polypeptide. Such modifications include, for example,acetylation, carboxylation, phosphorylation, glycosylation,ubiquitination, labeling, e.g., with radionuclides, and variousenzymatic modifications, as will be readily appreciated by those skilledin the art. A variety of methods for labeling polypeptides and ofsubstituents or labels useful for such purposes are well known in theart, and include radioactive isotopes such as ¹²⁵I, ³²P, ³⁵S, and ³H,ligands that bind to labeled antiligands (e.g., antibodies),fluorophores, chemiluminescent agents, enzymes, and antiligands that canserve as specific binding pair members for a labeled ligand. The choiceof label depends on the sensitivity required, ease of conjugation withthe primer, stability requirements, and available instrumentation.Methods for labeling polypeptides are well known in the art. See, e.g.,Ausubel et al., Current Protocols in Molecular Biology, GreenePublishing Associates (1992, and Supplements to 2002).

As used herein, “polypeptide mutant” or “mutein” refers to a polypeptidewhose sequence contains an insertion, duplication, deletion,rearrangement or substitution of one or more amino acids compared to theamino acid sequence of a reference protein or polypeptide, such as anative or wild-type protein. A mutein may have one or more amino acidpoint substitutions, in which a single amino acid at a position has beenchanged to another amino acid, one or more insertions and/or deletions,in which one or more amino acids are inserted or deleted, respectively,in the sequence of the reference protein, and/or truncations of theamino acid sequence at either or both the amino or carboxy termini. Amutein may have the same or a different biological activity compared tothe reference protein.

In some embodiments, a mutein has, for example, at least 85% overallsequence homology to its counterpart reference protein. In someembodiments, a mutein has at least 90% overall sequence homology to thewild-type protein. In other embodiments, a mutein exhibits at least 95%sequence identity, or 98%, or 99%, or 99.5% or 99.9% overall sequenceidentity.

As used herein, a “polypeptide tag for affinity purification” is anypolypeptide that has a binding partner that can be used to isolate orpurify a second protein or polypeptide sequence of interest fused to thefirst “tag” polypeptide. Several examples are well known in the art andinclude a His-6 tag [SEQ ID NO: 46], a FLAG epitope, a c-myc epitope, aStrep-TAGII, a biotin tag, a glutathione 5-transferase (GST), a chitinbinding protein (CBP), a maltose binding protein (MBP), or a metalaffinity tag.

As used herein, “recombinant” refers to a biomolecule, e.g., a gene orprotein, that (1) has been removed from its naturally occurringenvironment, (2) is not associated with all or a portion of apolynucleotide in which the gene is found in nature, (3) is operativelylinked to a polynucleotide which it is not linked to in nature, or (4)does not occur in nature. The term “recombinant” can be used inreference to cloned DNA isolates, chemically synthesized polynucleotideanalogs, or polynucleotide analogs that are biologically synthesized byheterologous systems, as well as proteins and/or mRNAs encoded by suchnucleic acids. Thus, for example, a protein synthesized by amicroorganism is recombinant, for example, if it is synthesized from anmRNA synthesized from a recombinant gene present in the cell.

The term “polynucleotide”, “nucleic acid molecule”, “nucleic acid”, or“nucleic acid sequence” refers to a polymeric form of nucleotides of atleast 10 bases in length. The term includes DNA molecules (e.g., cDNA orgenomic or synthetic DNA) and RNA molecules (e.g., mRNA or syntheticRNA), as well as analogs of DNA or RNA containing non-natural nucleotideanalogs, non-native internucleoside bonds, or both. The nucleic acid canbe in any topological conformation. For instance, the nucleic acid canbe single-stranded, double-stranded, triple-stranded, quadruplexed,partially double-stranded, branched, hairpinned, circular, or in apadlocked conformation.

A “synthetic” RNA, DNA or a mixed polymer is one created outside of acell, for example one synthesized chemically.

The term “nucleic acid fragment” as used herein refers to a nucleic acidsequence that has a deletion, e.g., a 5′-terminal or 3′-terminaldeletion compared to a full-length reference nucleotide sequence. In anembodiment, the nucleic acid fragment is a contiguous sequence in whichthe nucleotide sequence of the fragment is identical to thecorresponding positions in the naturally-occurring sequence. In someembodiments fragments are at least 10, 15, 20, or 25 nucleotides long,or at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or150 nucleotides long. In some embodiments a fragment of a nucleic acidsequence is a fragment of an open reading frame sequence. In someembodiments such a fragment encodes a polypeptide fragment (as definedherein) of the protein encoded by the open reading frame nucleotidesequence.

As used herein, an endogenous nucleic acid sequence in the genome of anorganism (or the encoded protein product of that sequence) is deemed“recombinant” herein if a heterologous sequence is placed adjacent tothe endogenous nucleic acid sequence, such that the expression of thisendogenous nucleic acid sequence is altered. In this context, aheterologous sequence is a sequence that is not naturally adjacent tothe endogenous nucleic acid sequence, whether or not the heterologoussequence is itself endogenous (originating from the same host cell orprogeny thereof) or exogenous (originating from a different host cell orprogeny thereof). By way of example, a promoter sequence can besubstituted (e.g., by homologous recombination) for the native promoterof a gene in the genome of a host cell, such that this gene has analtered expression pattern. This gene would now become “recombinant”because it is separated from at least some of the sequences thatnaturally flank it.

A nucleic acid is also considered “recombinant” if it contains anymodifications that do not naturally occur to the corresponding nucleicacid in a genome. For instance, an endogenous coding sequence isconsidered “recombinant” if it contains an insertion, deletion or apoint mutation introduced artificially, e.g., by human intervention. A“recombinant nucleic acid” also includes a nucleic acid integrated intoa host cell chromosome at a heterologous site and a nucleic acidconstruct present as an episome.

As used herein, the phrase “degenerate variant” of a reference nucleicacid sequence encompasses nucleic acid sequences that can be translated,according to the standard genetic code, to provide an amino acidsequence identical to that translated from the reference nucleic acidsequence. The term “degenerate oligonucleotide” or “degenerate primer”is used to signify an oligonucleotide capable of hybridizing with targetnucleic acid sequences that are not necessarily identical in sequencebut that are homologous to one another within one or more particularsegments.

The term “percent sequence identity” or “identical” in the context ofnucleic acid sequences refers to the residues in the two sequences whichare the same when aligned for maximum correspondence. The length ofsequence identity comparison may be over a stretch of at least aboutnine nucleotides, usually at least about 20 nucleotides, more usually atleast about 24 nucleotides, typically at least about 28 nucleotides,more typically at least about 32, and even more typically at least about36 or more nucleotides. There are a number of different algorithms knownin the art which can be used to measure nucleotide sequence identity.For instance, polynucleotide sequences can be compared using FASTA, Gapor Bestfit, which are programs in Wisconsin Package Version 10.0,Genetics Computer Group (GCG), Madison, Wis. FASTA provides alignmentsand percent sequence identity of the regions of the best overlap betweenthe query and search sequences. Pearson, Methods Enzymol. 183:63-98(1990). For instance, percent sequence identity between nucleic acidsequences can be determined using FASTA with its default parameters (aword size of 6 and the NOPAM factor for the scoring matrix) or using Gapwith its default parameters as provided in GCG Version 6.1, hereinincorporated by reference. Alternatively, sequences can be comparedusing the computer program, BLAST (Altschul et al., J. Mol. Biol.215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993);Madden et al., Meth. Enzymol. 266:131-141 (1996); Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res.7:649-656 (1997)), especially blastp or tblastn (Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997)).

The term “substantial homology” or “substantial similarity,” whenreferring to a nucleic acid or fragment thereof, indicates that, whenoptimally aligned with appropriate nucleotide insertions or deletionswith another nucleic acid (or its complementary strand), there isnucleotide sequence identity in at least about 76%, 80%, 85%, or atleast about 90%, or at least about 95%, 96%, 97%, 98% or 99% of thenucleotide bases, as measured by any well-known algorithm of sequenceidentity, such as FASTA, BLAST or Gap, as discussed above.

Alternatively, substantial homology or similarity exists when a nucleicacid or fragment thereof hybridizes to another nucleic acid, to a strandof another nucleic acid, or to the complementary strand thereof, understringent hybridization conditions. “Stringent hybridization conditions”and “stringent wash conditions” in the context of nucleic acidhybridization experiments depend upon a number of different physicalparameters. Nucleic acid hybridization will be affected by suchconditions as salt concentration, temperature, solvents, the basecomposition of the hybridizing species, length of the complementaryregions, and the number of nucleotide base mismatches between thehybridizing nucleic acids, as will be readily appreciated by thoseskilled in the art. One having ordinary skill in the art knows how tovary these parameters to achieve a particular stringency ofhybridization.

In general, “stringent hybridization” is performed at about 25° C. belowthe thermal melting point (Tm) for the specific DNA hybrid under aparticular set of conditions. “Stringent washing” is performed attemperatures about 5° C. lower than the Tm for the specific DNA hybridunder a particular set of conditions. The Tm is the temperature at which50% of the target sequence hybridizes to a perfectly matched probe. SeeSambrook et al., Molecular Cloning: A Laboratory Manual, 2d ed., ColdSpring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989), page9.51. For purposes herein, “stringent conditions” are defined forsolution phase hybridization as aqueous hybridization (i.e., free offormamide) in 6×SSC (where 20×SSC contains 3.0 M NaCl and 0.3 M sodiumcitrate), 1% SDS at 65° C. for 8-12 hours, followed by two washes in0.2×SSC, 0.1% SDS at 65° C. for 20 minutes. It will be appreciated bythe skilled worker that hybridization at 65° C. will occur at differentrates depending on a number of factors including the length and percentidentity of the sequences which are hybridizing.

As used herein, an “expression control sequence” refers topolynucleotide sequences which are necessary to affect the expression ofcoding sequences to which they are operatively linked. Expressioncontrol sequences are sequences which control the transcription,post-transcriptional events and translation of nucleic acid sequences.Expression control sequences include appropriate transcriptioninitiation, termination, promoter and enhancer sequences; efficient RNAprocessing signals such as splicing and polyadenylation signals;sequences that stabilize cytoplasmic mRNA; sequences that enhancetranslation efficiency (e.g., ribosome binding sites); sequences thatenhance protein stability; and when desired, sequences that enhanceprotein secretion. The nature of such control sequences differsdepending upon the host organism; in prokaryotes, such control sequencesgenerally include promoter, ribosomal binding site, and transcriptiontermination sequence. The term “control sequences” is intended toencompass, at a minimum, any component whose presence is essential forexpression, and can also encompass an additional component whosepresence is advantageous, for example, leader sequences and fusionpartner sequences.

As used herein, “operatively linked” or “operably linked” expressioncontrol sequences refers to a linkage in which the expression controlsequence is contiguous with the gene of interest to control the gene ofinterest, as well as expression control sequences that act in trans orat a distance to control the gene of interest.

As used herein, a “vector” is intended to refer to a nucleic acidmolecule capable of transporting another nucleic acid to which it hasbeen linked. One type of vector is a “plasmid,” which generally refersto a circular double stranded DNA loop into which additional DNAsegments may be ligated, but also includes linear double-strandedmolecules such as those resulting from amplification by the polymerasechain reaction (PCR) or from treatment of a circular plasmid with arestriction enzyme. Other vectors include cosmids, bacterial artificialchromosomes (BAC) and yeast artificial chromosomes (YAC). Another typeof vector is a viral vector, wherein additional DNA segments may beligated into the viral genome (discussed in more detail below). Certainvectors are capable of autonomous replication in a host cell into whichthey are introduced (e.g., vectors having an origin of replication whichfunctions in the host cell). Other vectors can be integrated into thegenome of a host cell upon introduction into the host cell, and arethereby replicated along with the host genome. Moreover, certain vectorsare capable of directing the expression of genes to which they areoperatively linked. Such vectors are referred to herein as “recombinantexpression vectors” (or simply “expression vectors”).

A “recombinant vector” is a vector into which a phage genome has beeninserted. In some embodiments a starting phage genome is inserted. Insome embodiments a recombinant phage genome is inserted. In someembodiments a starting phage genome is inserted and then is modified, inthe vector, to create a recombinant phage genome in the vector.

The term “recombinant host cell” (or simply “recombinant cell” or “hostcell”), as used herein, is intended to refer to a cell into which arecombinant nucleic acid such as a recombinant vector has beenintroduced. In some instances the word “cell” is replaced by a namespecifying a type of cell. For example, a “recombinant microorganism” isa recombinant host cell that is a microorganism host cell. It should beunderstood that such terms are intended to refer not only to theparticular subject cell but to the progeny of such a cell. Becausecertain modifications may occur in succeeding generations due to eithermutation or environmental influences, such progeny may not, in fact, beidentical to the parent cell, but are still included within the scope ofthe term “recombinant host cell,” “recombinant cell,” and “host cell”,as used herein. A recombinant host cell may be an isolated cell or cellline grown in culture or may be a cell which resides in a living tissueor organism.

As used herein, “bacteriophage” refers to a virus that infects bacteria.Similarly, “archaeophage” refers to a virus that infects archaea. Theterm “phage” is used to refer to both types of viruses but in certaininstances as indicated by the context may also be used as shorthand torefer to a bacteriophage or archeophage specifically. Bacteriophage andarcheophage are obligate intracellular parasites that multiply insidebacteria/archaea by making use of some or all of the host biosyntheticmachinery (i.e., viruses that infect bacteria). Though differentbacteriophages and archeophages may contain different materials, theyall contain nucleic acid and protein, and can under certaincircumstances be encapsulated in a lipid membrane. Depending upon thephage, the nucleic acid can be either DNA or RNA but not both and it canexist in various forms.

As used herein, “heterologous nucleic acid sequence” is any sequenceplaced at a location in the genome where it does not normally occur. Aheterologous nucleic acid sequence may comprise a sequence that does notnaturally occur in bacteria/archaea and/or phage or it may comprise onlysequences naturally found in bacteria/archaea and/or phage, but placedat a non-normally occurring location in the genome. In some embodimentsthe heterologous nucleic acid sequence is not a natural phage sequence;in some embodiments it is a natural phage sequence, albeit from adifferent phage; while in still other embodiments it is a sequence thatoccurs naturally in the genome of the starting phage but is then movedto another site where it does not naturally occur, rendering it aheterologous sequence at that new site.

A “starting phage” or “starting phage genome” is a phage isolated from anatural or human made environment that has not been modified by geneticengineering, or the genome of such a phage.

A “recombinant phage” or “recombinant phage genome” is a phage thatcomprises a genome that has been genetically modified by insertion of aheterologous nucleic acid sequence into the genome, or the genome of thephage. In some embodiments the genome of a starting phage is modified byrecombinant DNA technology to introduce a heterologous nucleic acidsequence into the genome at a defined site. In some embodiments theheterologous sequence is introduced with no corresponding loss ofendogenous phage genomic nucleotides. In other words, if bases N1 and N2are adjacent in the starting phage genome the exogenous sequence isinserted between N1 and N2. Thus, in the resulting recombinant genomethe heterologous sequence is flanked by nucleotides N1 and N2. In somecases the heterologous sequence is inserted and endogenous nucleotidesare removed or replaced with the exogenous sequence. For example, insome embodiments the exogenous sequence is inserted in place of some orall of the endogenous sequence which is removed. In some embodimentsendogenous sequences are removed from a position in the phage genomedistant from the site(s) of insertion of exogenous sequences.

A “phage host cell” is a cell that can be infected by a phage to yieldprogeny phage particles.

“Operatively linked” or “operably linked” expression control sequencesrefers to a linkage in which the expression control sequence iscontiguous with coding sequences of interest to control expression ofthe coding sequences of interest, as well as expression controlsequences that act in trans or at a distance to control expression ofthe coding sequence.

A “coding sequence” or “open reading frame” is a sequence of nucleotidesthat encodes a polypeptide or protein. The termini of the codingsequence are a start codon and a stop codon.

The term “expression control sequence” as used herein refers topolynucleotide sequences which affect the expression of coding sequencesto which they are operatively linked. Expression control sequences aresequences which control the transcription, post-transcriptional eventsand translation of nucleic acid sequences. Expression control sequencesinclude appropriate transcription initiation, termination, promoter andenhancer sequences; efficient RNA processing signals such as splicingand polyadenylation signals; sequences that stabilize cytoplasmic mRNA;sequences that enhance translation efficiency (e.g., ribosome bindingsites); sequences that enhance protein stability; and when desired,sequences that enhance protein secretion. The nature of such controlsequences differs depending upon the host organism; in prokaryotes, suchcontrol sequences generally include promoter, ribosomal binding site,and transcription termination sequence. The term “control sequences” isintended to include, at a minimum, all components whose presence isessential for expression, and can also include additional componentswhose presence is advantageous, for example, leader sequences and fusionpartner sequences.

As used herein, a “selectable marker” is a marker that confers uponcells that possess the marker the ability to grow in the presence orabsence of an agent that inhibits or stimulates, respectively, growth ofsimilar cells that do not express the marker. Such cells can also besaid to have a “selectable phenotype” by virtue of their expression ofthe selectable marker. For example, the ampicillin resistance gene(AmpR) confers the ability to grow in the presence of ampicillin oncells which possess and express the gene. (See Sutcliffe, J. G., ProcNatl Acad Sci USA. 1978 August; 75(8): 3737-3741.) Other nonlimitingexamples include genes that confer resistance to chloramphenicol,kanamycin, and tetracycline. Other markers include URA3, TRP and LEUthat allow growth in the absence of said uracil, tryptophan and leucine,respectively.

As used herein, a “screenable marker” is a detectable label that thatcan be used as a basis to identify cells that express the marker. Suchcells can also be said to have a “screenable phenotype” by virtue oftheir expression of the screenable marker. Suitable markers include aradiolabel, a fluorescent label, a nuclear magnetic resonance activelabel, a luminescent label, a chromophore label, a positron emittingisotope for PET scanner, chemiluminescence label, or an enzymatic label.Fluorescent labels include but are not limited to, green fluorescentprotein (GFP), fluorescein, and rhodamine. Chemiluminescence labelsinclude but are not limited to, luciferase and β-galactosidase.Enzymatic labels include but are not limited to peroxidase andphosphatase. A histag may also be a detectable label. In someembodiments a heterologous nucleic acid is introduced into a cell andthe cell then expresses a protein that is or comprises the label. Forexample, the introduced nucleic acid can comprise a coding sequence forGFP operatively linked to a regulatory sequence active in the cell.

As used herein, a “phage genome” includes naturally occurring phagegenomes and derivatives thereof. Generally, the derivatives possess theability to propagate in the same hosts as the parent. In someembodiments the only difference between a naturally occurring phagegenome and a derivative phage genome is at least one of a deletion andan addition of nucleotides from at least one end of the phage genome ifthe genome is linear or at least one point in the genome if the genomeis circular.

As used herein, a “vector host cell” is a cell that can host a givenvector type through at least several cell division cycles. Thus, avector host cell can replicate a vector introduced into the cell andpartition copies of the vector to each daughter cell through at leastseveral cell division cycles. For example, a yeast cell is a vector hostcell for a yeast artificial chromosome (YAC) vector.

As used herein, a “phage host cell” is a cell that can form phage from aparticular type of phage genomic DNA. In some embodiments the phagegenomic DNA is introduced into the cell by infection of the cell by aphage. In some embodiments the phage genomic DNA is introduced into thecell using transformation or any other suitable technique. In someembodiments the phage genomic DNA is substantially pure when introducedinto the cell. In some embodiments the phage genomic DNA is present in avector when introduced into the cell. In one non-limiting exemplaryembodiment the phage genomic DNA is present in the YAC that isintroduced into the phage host cell. The phage genomic DNA is thencopied and packaged into a phage particle following lysis of the phagehost cell. The definition of “phage host cell” necessarily can vary fromone phage to another. For example, E. coli may be a phage host cell fora particular type of phage while Salmonella enterica is not.

As used herein, a “competent phage host cell” is a phage host cell thata phage particle can infect, and in which the phage's genome can directproduction of phage particles from the cell. Thus, not all “phage hostcells” are “competent phage host cells,” but all “competent phage hostcells” are “phage host cells.”

As used herein, the term “non-sequence specific process” used inrelation to a process of insertion of a first nucleic acid sequence intoa second nucleic acid sequence is a process in which the site ofinsertion in the second nucleic acid sequence is not determined prior tothe insertion.

As used herein, a “transposase system” comprises a transposase enzyme ora nucleic acid capable of directing expression of the transposase, and agenetic element that can be mobilized by the enzyme. Typically thegenetic element comprises sequences at either end necessary formobilization and an internal heterologous sequence for insertion into atarget nucleic acid. Non-limiting examples of transposase systemsinclude Mos1 (mariner) (See Jacobsen et al., PNAS USA, Vol. 83, pp.8684-8688 (1986)), Mu, Tn5 (kits and reagents available from Epicentre®(www.epicenre.com), and piggybac (See U.S. Pat. No. 6,218,185).

As used herein, a “pre-determined position” in reference to the site ofinsertion of a heterologous nucleic acid sequence into a second nucleicacid sequence, such as a phage genome, means a site that was selectedprior to insertion of the heterologous nucleic acid sequence into thesecond nucleic acid sequence.

A. Phage

Bacteriophage and archaeophage are obligate intracellular parasites thatmultiply inside bacteria/archaea by making use of some or all of thehost biosynthetic machinery (i.e., viruses that infectbacteria/archaea). Though different phages may contain differentmaterials, they all contain nucleic acid and protein, and may be coveredby a lipid membrane. Depending upon the phage, the nucleic acid can beeither DNA or RNA but not both and it can exist in various forms. Thesize of the nucleic acid varies depending upon the phage. The simplestphages only have genomes a few thousand nucleotides in size, while themore complex phages may have more than 100,000 nucleotides in theirgenome, in rare instances more than 1,000,000. The number of differentkinds of protein and the amount of each kind of protein in the phageparticle will vary depending upon the phage. The proteins function ininfection and to protect the nucleic acid from nucleases in theenvironment.

Phages come in many different sizes and shapes. Most phages range insize from 24-200 nm in diameter. The head or capsid is composed of manycopies of one or more different proteins. The nucleic acid is located inthe head if it is present, which acts as a protective covering for it.Many but not all phages have tails attached to the phage head. The tailis a hollow tube through which the nucleic acid passes during infection.The size of the tail can vary and some phages do not even have a tailstructure. In the more complex phages the tail is surrounded by acontractile sheath which contracts during infection of the bacterium. Atthe end of the tail, phages have a base plate and one or more tailfibers attached to it. The base plate and tail fibers are involved inthe binding of the phage to the cell. Not all phages have base platesand tail fibers. In these instances other structures are involved inbinding of the phage particle to the bacterium/archaea.

The first step in the infection process is the adsorption of the phageto the cell. This step is mediated by the tail fibers or by someanalogous structure on those phages that lack tail fibers and it isreversible. The tail fibers attach to specific receptors on the cell andthe host specificity of the phage (i.e. the bacteria/archaea that it isable to infect) is usually determined by the type of tail fibers that aphage has. The nature of the bacterial/archaeal receptor varies fordifferent bacteria/archaea. Examples include proteins on the outersurface of the cell, LPS, pili, and lipoprotein. These receptors are onthe cell for other purposes and phage have evolved to use thesereceptors for infection.

The attachment of the phage to the cell via the tail fibers is a weakone and is reversible. Irreversible binding of phage to a cell ismediated by one or more of the components of the base plate. Phageslacking base plates have other ways of becoming tightly bound to thecell.

The irreversible binding of the phage to the cell results in thecontraction of the sheath (for those phages which have a sheath) and thehollow tail fiber is pushed through the bacterial/archaeal envelope.Phages that don't have contractile sheaths use other mechanisms to getthe phage particle through the bacterial/archaeal envelope. Some phageshave enzymes that digest various components of the envelope.

When the phage has gotten through the envelope the nucleic acid from thehead passes through the hollow tail and enters the cell. Usually, theonly phage component that actually enters the cell is the nucleic acid.The remainder of the phage remains on the outside of the cell. There aresome exceptions to this rule. This is different from animal cell virusesin which most of the virus particle usually gets into the cell.

Lytic or virulent phages are phages which can only multiply onbacteria/archaea and kill the cell by lysis at the end of the lifecycle. The lifecycle of a lytic phage begins with an eclipse period.During the eclipse phase, no infectious phage particles can be foundeither inside or outside the cell. The phage nucleic acid takes over thehost biosynthetic machinery and phage specified mRNAs and proteins aremade. There is an orderly expression of phage directed macromolecularsynthesis, just as one sees in animal virus infections. Early mRNAs codefor early proteins which are needed for phage DNA synthesis and forshutting off host DNA, RNA and protein biosynthesis. In some cases theearly proteins actually degrade the host chromosome. After phage DNA ismade late mRNAs and late proteins are made. The late proteins are thestructural proteins that comprise the phage as well as the proteinsneeded for lysis of the bacterial cell. Next, in the intracellularaccumulation phase the nucleic acid and structural proteins that havebeen made are assembled and infectious phage particles accumulate withinthe cell. During the lysis and release phase the bacteria/archaea beginto lyse due to the accumulation of the phage lysis protein andintracellular phage are released into the medium. The number ofparticles released per infected cell can be as high as 1000 or more.

Lytic phage may be enumerated by a plaque assay. A plaque is a cleararea which results in a lawn of bacterial/archaea grown on a solid mediafrom the lysis of bacteria/archaea. The assay is performed at a lowenough concentration of phage that each plaque arises from a singleinfectious phage. The infectious particle that gives rise to a plaque iscalled a PFU (plaque forming unit).

Lysogenic or temperate phages are those that can either multiply via thelytic cycle or enter a quiescent state in the cell. In this quiescentstate most of the phage genes are not transcribed; the phage genomeexists in a repressed state. The phage DNA in this repressed state iscalled a prophage because it is not a phage but it has the potential toproduce phage. In most cases the phage DNA actually integrates into thehost chromosome and is replicated along with the host chromosome andpassed on to the daughter cells. The cell harboring a prophage is notadversely affected by the presence of the prophage and the lysogenicstate may persist indefinitely. The cell harboring a prophage is termeda lysogen.

The mechanisms of lysongeny differ between phage. In a classic example,phage lambda, lambda DNA is a double stranded linear molecule with smallsingle stranded regions at the 5′ ends. These single stranded ends arecomplementary (cohesive ends) so that they can base pair and produce acircular molecule. In the cell the free ends of the circle can beligated to form a covalently closed circle. A site-specificrecombination event, catalyzed by a phage coded enzyme, occurs between aparticular site on the circularized phage DNA and a particular site onthe host chromosome. The result is the integration of the phage DNA intothe host chromosome. A phage coded protein, called a repressor, is madewhich binds to a particular site on the phage DNA, called the operator,and shuts off transcription of most phage genes except the repressorgene. The result is a stable repressed phage genome which is integratedinto the host chromosome. Each temperate phage will only repress its ownDNA and not that from other phage, so that repression is very specific(immunity to superinfection with the same phage).

Anytime a lysogenic bacterium/archaea is exposed to adverse conditions,the lysogenic state can be terminated. This process is called induction.Conditions which favor the termination of the lysogenic state include:desiccation, exposure to UV or ionizing radiation, exposure to mutagenicchemicals, etc. Adverse conditions lead to the production of proteases(rec A protein) which destroy the repressor protein. This in turn leadsto the expression of the phage genes, reversal of the integrationprocess and lytic multiplication.

In some embodiments of this disclosure a starting phage genome comprisesat least 5 kilobases (kb), at least 10 kb, at least 15 kb, at least 20kb, at least 25 kb, at least 30 kb, at least 35 kb, at least 40 kb, atleast 45 kb, at least 50 kb, at least 55 kb, at least 60 kb, at least 65kb, at least 70 kb, at least 75 kb, at least 80 kb, at least 85 kb, atleast 90 kb, at least 95 kb, at least 100 kb, at least 105 kb, at least110 kb, at least 115 kb, at least 120 kb, at least 125 kb, at least 130kb, at least 135 kb, at least 140 kb, at least 145 kb, at least 150 kb,at least 175 kb, at least 200 kb, at least 225 kb, at least 250 kb, atleast 275 kb, at least 300 kb, at least 325 kb, at least 350 kb, atleast 325 kb, at least 350 kb, at least 375 kb, at least 400 kb, atleast 425 kb, at least 450 kb, at least 475 kb, at least 500 kb, ormore.

In some embodiments of this disclosure a starting phage is a member ofan order selected from Caudovirales, Microviridae, Corticoviridae,Tectiviridae, Leviviridae, Cystoviridae, Inoviridae, Lipothrixviridae,Rudiviridae, Plasmaviridae, and Fuselloviridae. In some embodiments thephage is a member of the order Caudovirales and is a member of a familyselected from Myoviridae, Siphoviridae, and Podoviridae.

In some embodiments of this disclosure the phage is able to productivelyinfect archaea. In some embodiments the archaea is a Euryarcheota. Insome embodiments the archaea is a Crenarcheota. In some embodiments ofthis disclosure the phage is able to productively infect bacteria. Insome embodiments the bacteria is a member of a phyla selectedfromActinobacteria, Aquificae, Armatimonadetes, Bacteroidetes,Caldiserica, Chlamydiae, Chloroflexi, Chrysiogenetes, Cyanobacteria,Deferribacteres, Deinococcus-Thermus, Dictyoglomi, Elusimicrobia,Fibrobacteres, Firmicutes, Fusobacteria, Gemmatimonadetes, Nitrospirae,Planctomycetes, Proteobacteria, Spirochaetes, Synergistets, Tenericutes,Thermodesulfobacteria, Thermotogae. In some embodiments the phage isable to productively infect at least one Firmicutes selected fromBacillus, Listeria, Staphylococcus. In some embodiments the phage isable to productively infect at least one Proteobacteria selected fromAcidobacillus, Aeromonas, Burkholderia, Neisseria, Shewanella,Citrobacter, Enterobacter, Erwinia, Escherichia, Klebsiella, Kluyvera,Morganella, Salmonella, Shigella, Yersinia, Coxiella, Rickettsia,Legionella, Avibacterium, Haemophilus, Pasteurella, Acinetobacter,Moraxella, Pseudomonas, Vibrio, Xanthomonas. In some embodiments thephage is able to productively infect at least one Tenericutes selectedfrom Mycoplasma, Spiroplasma, and Ureaplasma.

Phage genomes comprise end structures that present challenges to cloningan intact phage genome that retains the ability to infect targetmicrobes and produce daughter phage. The methods of this disclosure areparticularly useful because they enable the cloning of phage genomeswith intact ends such that the cloned phage retain the ability to infecttarget microbes and produce daughter phage. In some embodiments thephage genome comprises terminal perfect repeats. In some embodiments thephage genome comprises imperfect repeats.

In some embodiments the repeats have a unit size of from 3 nucleotidesto 20 kb. That is, each copy of the repeat “unit” is that long. In someembodiments the repeats have a unit size of from 5 nucleotides to 1 kb.In some embodiments the repeats have a unit size of from 10 nucleotidesto 1 kb. In some embodiments the repeats have a unit size of from 25nucleotides to 1 kb. In some embodiments the repeats have a unit size offrom 50 nucleotides to 1 kb. In some embodiments the repeats have a unitsize of from 100 nucleotides to 1 kb. In some embodiments the repeatshave a unit size of from 250 nucleotides to 1 kb. In some embodimentsthe repeats have a unit size of from 500 nucleotides to 1 kb. In someembodiments the repeats have a unit size of from 100 nucleotides to 5kb. In some embodiments the repeats have a unit size of from 250nucleotides to 5 kb. In some embodiments the repeats have a unit size offrom 500 nucleotides to 5 kb. In some embodiments the repeats have aunit size of from 1 kb to 5 kb. In some embodiments the repeats have aunit size of from 2 kb to 5 kb. In some embodiments the repeats have aunit size of from 3 kb to 5 kb. In some embodiments the repeats have aunit size of from 4 kb to 5 kb. In some embodiments the repeats have aunit size of from 100 nucleotides to 10 kb. In some embodiments therepeats have a unit size of from 250 nucleotides to 10 kb. In someembodiments the repeats have a unit size of from 500 nucleotides to 10kb. In some embodiments the repeats have a unit size of from 1 kb to 10kb. In some embodiments the repeats have a unit size of from 2 kb to 10kb. In some embodiments the repeats have a unit size of from 5 kb to 10kb.

In some embodiments the repeats have a total length (at least terminus)of from 3 nucleotides to 20 kb. In some embodiments the repeats have atotal length of from 10 nucleotides to 20 kb. In some embodiments therepeats have a total length of from 25 nucleotides to 20 kb. In someembodiments the repeats have a total length of from 50 nucleotides to 20kb. In some embodiments the repeats have a total length of from 100nucleotides to 20 kb. In some embodiments the repeats have a totallength of from 250 nucleotides to 20 kb. In some embodiments the repeatshave a total length of from 500 nucleotides to 20 kb. In someembodiments the repeats have a total length of from 1 kb to 20 kb. Insome embodiments the repeats have a total length of from 2 kb to 20 kb.In some embodiments the repeats have a total length of from 3 kb to 20kb. In some embodiments the repeats have a total length of from 4 kb to20 kb. In some embodiments the repeats have a total length of from 5 kbto 20 kb. In some embodiments the repeats have a total length of from 10kb to 20 kb. In some embodiments the repeats have a total length of from1 kb to 2 kb. In some embodiments the repeats have a total length offrom 1 kb to 3 kb. In some embodiments the repeats have a total lengthof from 1 kb to 4 kb. In some embodiments the repeats have a totallength of from 1 kb to 5 kb. In some embodiments the repeats have atotal length of from 2 kb to 4 kb. In some embodiments the repeats havea total length of from 3 kb to 5 kb. In some embodiments the repeatshave a total length of from 4 kb to 6 kb. In some embodiments therepeats have a total length of from 5 kb to 10 kb.

B. Phage Capture

1. Isolation of Phage Genomes

Any suitable method may be used to isolate phage genomes from phagecultures and/or isolated phage and/or concentrated phage preparations.For example one or more of the following column-based, PEG-based,filter-based, and cesium chloride centrifugation methods may be used.

Column-Based:

High-titer lysates of a phage culture are further concentrated viachromatography based on charge and/or affinity, allowing theconcentration of large volumes of lysate into very small volumes.Passing the phages over a column, and then eluting into a small volumeprovides the material for DNA-harvesting of phages for further genomemanipulation.

PEG-Based:

The presence of high-concentrations of polyethylene glycol allowsprecipitation of active phage particles from a lower-titer, high volumeof phage material. This type of standard treatment allows greater thanone hundred-fold concentration of phage lysates, allowing large amountsof DNA to be recovered for further genome manipulation.

Filter-Based:

Filtering lysates to remove large cell debris, followed by filtration inthe 100 kDa size range allows the retention of phage particles, whilelosing water and salts in the phage lysate preparation. This is yetanother technique for concentrating phages for isolation of largeamounts of DNA for further phage genome manipulation.

Cesium Chloride Centrifugation:

Concentrated lysates are further purified by treating them with DNasesto remove contaminating host DNA, followed by centrifugation in a cesiumchloride gradient to purify the phage particles away from the celldebris. These highly purified lysates will produce very clean DNA forlater manipulation.

Purification of DNA:

Regardless of the purification method of phage particles, phage lysatesare optionally treated with proteases and chloroform to remove the phagecoats, followed by either column-based DNA purification or ethanolprecipitation of the recovered DNA. All DNA recovered at this step isready for further capture and manipulation as outlined below.

Optional Sequencing of Phage Genomic DNA:

If the starting phage genomic sequence is unknown, the following processmay optionally be used to generate a complete sequence:

First, next generation sequencing techniques may be used to generatecontigs. Such methods generate large amounts of data that can be used toassemble contiguous pieces of phage sequence. This sequence is often notsufficient to close an entire phage genome with a single pass.

Remaining gaps may be filled using PCR-based techniques. Primersdesigned to anneal to the ends of contigs can be used in combination todo PCR on the phage genomic DNA. Only primers from contigs that areadjacent to each other will amplify a product. These PCR products can besequenced by traditional Sanger sequencing to close the gaps betweencontigs.

Modified Sanger sequencing can be done directly off of phage genomicDNA. This technique can be used to sequence off of the ends of the phagegiven that PCR cannot be used to capture this final sequence. This willcomplete the phage genomic sequence.

2. Capture of Phage Genomes in Yeast Artificial Chromosomes

Isolated phage genomes are then captured in a vector. Examples ofsuitable vectors include bacterial artificial chromosomes (BACs) andyeast artificial chromosomes (YACs).

a. Homologous Recombination of Purely Linear Phage Genomes or LinearPhage Genomes with Imperfect Repeats Using Short OligonucleotideDuplexes

Bacteriophage for which the genome sequence is known provide a means torecombine the genome into a circular yeast artificial chromosome (YAC)using double strand break repair or other modes of recombination inyeast such as S. cerevisae. This method may be used for phages withpurely linear genomes or linear phage genomes with imperfect repeats atthe ends. A replicating yeast vector with a selectable marker is firstlinearized and “stitching” oligonucleotides are designed that containsequence from the 3′ ends of the linear bacteriophage genome as well asDNA flanking the double strand break in the yeast vector. Suitableoligonucleotides are for example from 20 to 2 kb long, such as 20 to 500bp long, 50 to 500 bp long, 100 to 500 bp long, 200 to 500 bp long, 100to 750 bp long, 250 bp to 1 kb long, and 500 bp to 2 kb long. The phagegenomic DNA, stitching oligonucleotides, and a linearized yeast vectorare cotransformed into competent yeast cells and plated on selectivemedia. This procedure represents a clone DNA or die strategy thatprovides a way of selecting for those linearized vectors that haveformed circles through DNA recombination via homologous sequences at theends of vector and the phage genome. Colonies of yeast able to grow onselective media are then screened for presence of the junctions betweenthe YAC DNA and the phage DNA, a DNA structure that only occurs ifcloning of the phage DNA has been successful.

b. Homologous Recombination of Linear Phage Genomes with PerfectRepeats.

To capture phages with linear phage genomes that have perfect repeats attheir ends, oligonucleotide duplexes may be used. The duplexes generallycontain a portion that is homologous to the vector and a portion that ishomologous to the phage genome, to stimulate homologous recombinationbetween the vector and the phage genome for capture. Theoligonucleotides are typically from 40 bases to 5 kb long, such as from40 to 80 bases, from 50 to 100 bases, from 60 to 120 bases, from 80 to160 bases, from 100 to 200 bases, from 200 to 400 bases, from 300 to 600bases, from 400 to 800 bases, from 500 bases to 1 kb, from 1 to 2 kb orfrom 2 to 5 kb long.

These oligonucleotide duplexes are typically designed to capture varyingportions of the phage genome. For example, in linear phage genomes withrelatively short perfect repeats (for example, R-GGG-R, where Rrepresents the perfect repeats and GGG represents the non-repeated phagegenome sequence), 100% of the unique genome sequence can be captured bycapturing one repeat with the non-repeated genome (for example R-GGG) ormore than 100% of the unique genome sequence by capturing both repeatswith the non-repeated genome (for example, R-GGG-R).

C. End Structures of Captured Phage Genomes.

In some embodiments the full length phage genome is captured. In someembodiments from 1 nucleotide to 20 kb of sequence at one or both endsof the genome is absent from the captured genome. In some embodiments atleast 2, 3, 4, 5 or 10 nucleotides of sequence at one or both ends ofthe genome is absent from the captured genome. In some embodiments atleast 20, 40, 60, 80, 100, 200, 300, 400, 500, 600, 700, 800, 900, or1,000 nucleotides of sequence at one or both ends of the genome isabsent from the captured genome. In some embodiments from 1 to 10nucleotides, from 5 to 20 nucleotides, from 10 to 25 nucleotides, from20 to 50 nucleotides, from 50 to 100 nucleotides, from 100 to 250nucleotides, from 250 to 500 nucleotides, or from 500 to 1,000nucleotides of sequence at one or both ends of the genome is absent fromthe captured genome. In some embodiments an integer number of repeatspresent at an end of the phage genome is absent from the capturedgenome. That is, if the phage naturally comprises 10 complete repeats ofa sequence at each end of its genome one or both ends of the capturedgenome may comprise fewer than 10 complete repeats. In all cases, anymodifications of the phage genome at one end may be the same as amodification at the other end or may be different, and one end may bemodified even if the other is not.

In some embodiments from 1 nucleotide to 20 kb of sequence at one orboth ends of the genome is duplicated. In some embodiments at least 2,3, 4, 5 or 10 nucleotides of sequence at one or both ends of the genomeis duplicated in the captured genome. In some embodiments at least 20,40, 60, 80, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000nucleotides of sequence at one or both ends of the genome is duplicatedin the captured genome. In some embodiments from 1 to 10 nucleotides,from 5 to 20 nucleotides, from 10 to 25 nucleotides, from 20 to 50nucleotides, from 50 to 100 nucleotides, from 100 to 250 nucleotides,from 250 to 500 nucleotides, or from 500 to 1,000 nucleotides ofsequence at one or both ends of the genome is duplicated in the capturedgenome. In some embodiments an integer number of repeats present at anend of the phage genome is duplicated in the captured genome. That is,if the phage naturally comprises 10 complete repeats of a sequence ateach end of its genome one or both ends of the captured genome maycomprise more than 10 complete repeats. In all cases, any modificationsof the phage genome at one end may be the same as a modification at theother end or may be different, and one end may be modified even if theother is not.

3. Detection of Captured Phage Genomes

a. PCR-Based Methods.

Primers may be used to enable PCR-based confirmation of captured phagegenomes. For example, if one primer is specific for a portion of the YACvector just outside the region of the captured phage and another primeris specific for a portion of the phage genome, these primers shouldtogether amplify a band to verify that the proper phage-YAC capture andjunctions are present in a vector.

b. Direct Sequencing.

An alternative is to directly sequence the captured phage genomes toconfirm the presence of the phage DNA inside the vector.

c. Restriction Digestion.

Captured phage genomes may also be identified and characterized usingrestriction digestion and gel electrophoresis.

d. Phi29/Sequencing Readout.

Typically, the YAC bearing the phage genome is not maintained in highcopy number per cell. To facilitate assaying for the presence of phageand engineered phage the YAC may be amplified using a DNA polymerasefrom bacteriophage Phi29 that can copy the genome in vitro. Thesesubstrates may then be used for transformation and sequencing.

e. Phi29/RFLP Readout

Amplification of the phage-YACs with Phi29 polymerase allows foranalysis with restriction enzymes to identify Restriction FragmentLength Polymorphisms (RFLPs) for rapid whole genome analysis. Theseproducts are run on agarose gels and analyzed via ethidium bromidestaining.

C. Engineering Captured Phage Genomes

In some embodiments a heterologous nucleic acid sequence is insertedinto a starting phage genome to create a recombinant phage genome. Insome embodiments the recombinant phage genome is further modified tocreate a different recombinant phage genome.

1. Heterologous Nucleic Acid Sequences

The heterologous nucleic acid sequence may be any nucleic acid sequence.In some embodiments the length of the heterologous nucleic acid sequenceis at least 100 bases, at least 200 based, at least 300 bases, at least400 bases, at least 500 bases, at least 600 bases, at least 700 bases,at least 800 bases, at least 900 bases, at least 1 kilobase (kb), atleast 1.1 kb, at least 1.2 kb, at least 1.3 kb, at least 1.4 kb, atleast 1.5 kb, at least 1.6 kb, at least 1.7 kb, at least 1.8 kb, atleast 1.9 kb, at least 2.0 kb, at least 2.1 kb, at least 2.2 kb, atleast 2.3 kb, at least 2.4 kb, at least 2.5 kb, at least 2.6 kb, atleast 2.7 kb, at least 2.8 kb, at least 2.9 kb, at least 3.0 kb, atleast 3.1 kb, at least 3.2 kb, at least 3.3 kb, at least 3.4 kb, atleast 3.5 kb, at least 3.6 kb, at least 3.7 kb, at least 3.8 kb, atleast 3.9 kb, at least 4.0 kb, at least 4.5 kb, at least 5.0 kb, atleast 5.5 kb, at least 5.5 kb, at least 6.0 kb, at least 6.5 kb, atleast 7.0 kb, at least 7.5 kb, at least 8.0 kb, at least 8.5 kb, atleast 9.0 kb, at least 9.5 kb, at least 10 kb, or more. In some suchembodiments the heterologous nucleic acid sequence comprises a lengththat is less than or equal to the maximum length of heterologous nucleicacid sequence that can be packaged into a phage particle comprising thephage genome. In some such embodiments the heterologous nucleic acidsequence comprises a length that is less than or equal to a length chosefrom 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, and 10 kb.

In some embodiments the length of the heterologous nucleic acid sequenceis from 100 to 500 bases, from 200 to 1,000 bases, from 500 to 1,000bases, from 500 to 1,500 bases, from 1 kb to 2 kb, from 1.5 kb to 2.5kb, from 2.0 kb to 3.0 kb, from 2.5 kb to 3.5 kb, from 3.0 kb to 4.0 kb,from 3.5 kb to 4.5 kb, from 4.0 kb to 5.0 kb, from 4.5 kb to 5.5 kb,from 5.0 kb to 6.0 kb, from 5.5 kb to 6.5 kb, from 6.0 kb to 7.0 kb,from 6.5 kb to 7.5 kb, from 7.0 kb to 8.0 kb, from 7.5 kb to 8.5 kb,from 8.0 kb to 9.0 kb, from 8.5 kb to 9.5 kb, or from 9.0 kb to 10.0 kb.

In some embodiments the ratio of the length of the heterologous nucleicacid sequence to the total length of the genome of the recombinant phageis at least 0.05, at least 0.10, at least 0.15, at least 0.20, or atleast 0.25. In some embodiments the ratio of the length of the genome ofthe recombinant phage to the length of the genome of the correspondingstarting phage is at least 1.05, at least 1.10, at least 1.15, at least1.20, or at least 1.25.

In some embodiments the heterologous nucleic acid sequence is insertedinto the starting phage genome with no loss of endogenous starting phagegenome sequence. In some embodiments the inserted heterologous nucleicacid sequence replaces endogenous starting phage genome sequence. Insome such embodiments the heterologous nucleic acid sequence replaces anamount of endogenous genomic sequence that is less than the length ofthe heterologous nucleic acid sequence. Thus, the length of therecombinant phage genome is longer than the length of the starting phagegenome. In some such embodiments the heterologous nucleic acid sequencereplaces an amount of endogenous genomic sequence that is greater thanthe length of the heterologous nucleic acid sequence. Thus, the lengthof the recombinant phage genome is shorter than the length of thestarting phage genome. In some such embodiments the heterologous nucleicacid sequence replaces an amount of endogenous genomic sequence that isequal to the length of the heterologous nucleic acid sequence.

In some embodiments the heterologous nucleic acid sequence comprises anfirst open reading frame.

In some embodiments the open reading frame encodes a marker that confersat least one phenotype on a vector host cell comprising the vectorselected from a selectable phenotype and a screenable phenotype. In suchembodiments the vector comprises an expression expression controlsequence capable of directing expression of the open reading frame inthe vector host cell. In some embodiments the selectable phenotype orthe screenable phenotype is used to identify a host cell that comprisesthe vector comprising the phage genome comprising the open reading frameencoding the marker that confers at least one phenotype on a vector hostcell comprising the vector selected from a selectable phenotype and ascreenable phenotype. In some embodiments a portion of the vectoroutside of the phage genome comprises an open reading frame encoding amarker that confers at least one phenotype on a vector host cellcomprising the vector selected from a selectable phenotype and ascreenable phenotype. In some embodiments both the vector outside of thephage genome and the heterologous nucleic acid sequence inserted intothe phage genome encode such a marker. In some embodiments the markerencoded by the open reading frame in the vector sequences and the markerencoded by the open reading frame in the heterologous nucleic acidsequence inserted into the phage genome are different.

In some embodiments the open reading frame encodes a protein thatconfers a phenotype of interest on a phage host cell expressing it. Insome embodiments the phenotype of interest is simply expression of theexpression product of the open reading frame. In some embodiments thephenotype of interest is a change in a structural feature of the phagehost cell. In some embodiments the phenotype of interest is expressionof a marker that confers at least one phenotype on a phage host cellcomprising the phage genome selected from a selectable phenotype and ascreenable phenotype. In such embodiments the open reading frame isoperatively linked to an expression control sequence capable ofdirecting expression of the open reading frame in a phage host cell. Theexpression control sequence may be located in the heterologous nucleicacid sequence or it may be in the endogenous phage genome sequence(i.e., it may be a sequence present in the starting phage genome). Forexample, the open reading frame may be inserted into the phage genomedownstream of or in the place of an endogenous phage open reading framesequence.

In some embodiments the open reading frame encodes a protein that servesas a marker that can be identified by screening of phage host cellsinfected by a recombinant phage comprising a heterologous nucleic acidsequence comprising the open reading frame. Examples of such markersinclude by way of example and without limitation: a radiolabel, afluorescent label, a nuclear magnetic resonance active label, aluminescent label, a chromophore label, a positron emitting isotope forPET scanner, chemiluminescence label, or an enzymatic label. Fluorescentlabels include but are not limited to, green fluorescent protein (GFP),fluorescein, and rhodamine. Chemiluminescence labels include but are notlimited to, luciferase and β-galactosidase. Enzymatic labels include butare not limited to peroxidase and phosphatase. A Histag can also be usedas a detectable label. In some embodiments a heterologous nucleic acidis introduced into a cell and the cell then expresses a protein that isor comprises the label. In some embodiments the open reading frameencodes a protein that is not normally produced by the phage host cell.Such a protein can be used as a marker that can be identified byscreening, for example, by detecting the protein using an immunoassay.In some embodiments the screenable marker is detected in an assay toidentify the presence of phage host cells in a sample. For example, thephage host cells can be a bacterial cell type that contaminates a foodprocessing plant and detection of expression of the screenable marker inthe cells following mixing of the recombinant phage with the sample canbe used as an assay to detect contamination of the food processing plantby the phage host cells.

In some embodiments the open reading frame encodes a protein selectedfrom a nuclease, endonuclease, protease, glycosidase, glycanase,hydrolase, lyase, esterase, phosphodiesterase, cellulase, lysin, andkinase. In some embodiments the protein is any protein other than atleast one of a nuclease, endonuclease, protease, glycosidase, glycanase,hydrolase, lyase, esterase, phosphodiesterase, cellulase, lysin, andkinase.

In some embodiments the open reading frame encodes a protein listed inTable 1.

TABLE 1 Common Name of Protein EC Substrate Nattokinase 3.4.21.62protein, amyloids Dispersin B 3.2.1.52 beta-1,6-N-acetyl-D- glucosamineAlginate lyase 4.2.2.3 alginate Alginate lyase 4.2.2.11 alginate NucA3.1.30.2 DNA, RNA Endoglucanase 3.2.1.4 cellulose, lichenin, cerealbeta-D-glucans Subtilisin 3.4.21.62 protein A1pP autolysis Dnase A DNA,RNA Aqualysin 3.4.21.62 protein endX 3.1.21.— DNA Subtilisin-likeprotein protease glucan endo-1,3- 3.2.1.39 beta-1,3-glucans in fungalbeta-glucosidase cell walls A1 Thermonuclease 3.1.31.1 DNA, RNAMycolysin 3.4.24.31 protein, hydrophobic residues in P1′ DNAase I3.1.21.1 DNA Proteinase K 3.4.21.64 protein Streptogrysin-C 3.4.21.—protein, similar to chymotrypsin, possibly specialized for chitin-likeproteins Streptogrysin-D 3.4.21.— protein large aliphatic or aromaticamino acids Streptogrisin-A 3.4.21.80 protein, large aliphatic oraromatic amino acids Streptogrisin-B 3.4.21.81 protein, large aliphaticor aromatic amino acids xanthan lyase xanthan beta-D-glucanase xanthanManA endo-beta-1,4-mannose Quorum-sensing molecules Gellan lyase gellanSphinganase gellan and similar polymers

In some embodiments the open reading frame encodes a screenable markerthat may be used to detect phage host cells that express it. Such cellscan also be said to have a screenable phenotype by virtue of theirexpression of the screenable marker. Any molecule that can bedifferentially detected upon expression in a phage host cell may serveas a screenable marker in this context. A screenable marker may be anucleic acid molecule or a portion thereof, such as an RNA or a DNAmolecule that is single or double stranded. Alternatively, a screenablemarker may be a protein or a portion thereof. Suitable protein markersinclude enzymes that catalyzes formation of a detectable reactionproduct. An example is a chemiluminescent protein such as luciferase orvariations, such as luxAB, and β-galactosidase. Another example is thehorseradish peroxidase enzyme. Proteins used to generate a luminescentsignal fall into two broad categories: those that generate lightdirectly (luciferases and related proteins) and those that are used togenerate light indirectly as part of a chemical cascade (horseradishperoxidase). The most common bioluminescent proteins used in biologicalresearch are aequorin and luciferase. The former protein is derived fromthe jellyfish Aequorea victoria and can be used to determine calciumconcentrations in solution. The luciferase family of proteins has beenadapted for a broad range of experimental purposes. Luciferases fromfirefly and Renilla are the most commonly used in biological research.These proteins have also been genetically separated into two distinctfunctional domains that will generate light only when the proteins areclosely co-localized. A variety of emission spectrum-shifted mutantderivatives of both of these proteins have been generated over the pastdecade. These have been used for multi-color imaging and co-localizationwithin a living cell. The other groups of proteins used to generatechemiluminescent signal are peroxidases and phosphatases. Peroxidasesgenerate peroxide that oxidizes luminol in a reaction that generateslight. The most widely used of these is horseradish peroxidase (HRP),which has been used extensively for detection in western blots andELISAs. A second group of proteins that have been employed in a similarfashion are alkaline phosphatases, which remove a phosphate from asubstrate molecule, destabilizing it and initiating a cascade thatresults in the emission of light.

Other suitable screenable markers include fluorescent proteins.Fluorescent proteins include but are not limited to blue/UV fluorescentproteins (for example, TagBFP, Azurite, EBFP2, mKalamal, Sirius,Sapphire, and T-Sapphire), cyan fluorescent proteins (for example, ECFP,Cerulean, SCFP3A, mTurquoise, monomeric Midoriishi-Cyan, TagCFP, andmTFP1), green fluorescent proteins (for example, EGFP, Emerald,Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, and mWasabi),yellow fluorescent proteins (for example, EYFP, Citrine, Venus, SYFP2,and TagYFP), orange fluorescent proteins (for example, MonomericKusabira-Orange, mKOκ, mKO2, mOrange, and mOrange2), red fluorescentproteins (for example, mRaspberry, mCherry, mStrawberry, mTangerine,tdTomato, TagRFP, TagRFP-T, mApple, and mRuby), far-red fluorescentproteins (for example, mPlum, HcRed-Tandem, mKate2, mNeptune, andNirFP), near-IR fluorescent proteins (for example, TagRFP657, IFP1.4,and iRFP), long stokes-shift proteins (for example, mKeima Red,LSS-mKate1, and LSS-mKate2), photoactivatible fluorescent proteins (forexample, PA-GFP, PAmCherry1, and PATagRFP), photoconvertible fluorescentproteins (for example, Kaede (green), Kaede (red), KikGR1 (green),KikGR1 (red), PS-CFP2, PS-CFP2, mEos2 (green), mEos2 (red), PSmOrange,and PSmOrange), and photoswitchable fluorescent proteins (for example,Dronpa). Several variants and alternatives to the listed examples arealso well known to those of skill in the art and may be substituted inappropriate applications.

Other suitable markers include epitopes. For example, a proteincomprising an epitope that can be detected with an antibody or otherbinding molecule is an example of a screenable marker. An antibody thatrecognizes the epitope may be directly linked to a signal generatingmoiety (such as by covalent attachment of a chemiluminescent orfluorescent protein) or it can be detected using at least one additionalbinding reagent such as a secondary antibody, directly linked to asignal generating moiety, for example. In some embodiments the epitopeis not present in the proteins of the phage or the target microorganismso detection of the epitope in a sample indicates that the proteincomprising the epitope was produced by the microorganism followinginfection by the recombinant phage comprising a gene encoding theprotein comprising the epitope. In other embodiments the marker may be apurification tag in the context of a protein that is naturally presentin the target microorganism or the phage. For example, the tag (e.g., a6-His tag [SEQ ID NO: 46]) can be used to purify the heterologousprotein from other bacterial or phage proteins and the purified proteincan then be detected, for example using an antibody.

In some embodiments the heterologous nucleic acid sequence comprises atleast a first open reading frame and a second open reading frame. Insome embodiments the first and second open reading frames areoperatively linked to the same expression control sequences. In someembodiments the first and at least one second open reading frames areoperatively linked to different expression control sequences.

In some embodiments the first open reading frame encodes a marker thatconfers at least one phenotype on a vector host cell comprising thevector selected from a selectable phenotype and a screenable phenotype,and the second open reading frame encodes a gene product that is not amarker that confers at least one phenotype on a vector host cellcomprising the vector selected from a selectable phenotype and ascreenable phenotype. In some embodiments the second open reading frameconfers a phenotype of interest on a phage host cell expressing it.

One example of a heterologous nucleic acid cassette that may be used forhomologous recombination to introduce a heterologous nucleic acidsequence into a cloned phage genome is a cassette comprising a firstopen reading frame encoding the selectable marker URA3 and a second openreading frame encoding luciferase. In this cassette the URA3 openreading frame encodes a marker that confers at least one phenotype on avector host cell comprising the vector selected from a selectablephenotype and a screenable phenotype and the luciferase open readingframe encodes a protein that confers a phenotype of interest on a phagehost cell comprising a phage genome comprising the open reading frame.In this case the luciferase gene product produces a detectable signalupon exposure to substrate luciferin and this signal in turn allows fordetection of phage host cells infected by the engineered phage.

In some embodiments, all or part of a heterologous nucleic acid sequencepresent in a recombinant phage genome is deleted and/or replaced with adifferent heterologous nucleic acid sequence. The deletion and/orreplacement may be performed, for example, in a vector host cell. Insome embodiments a heterologous open reading frame is modified to encodea variant or mutein of the protein or polypeptide encoded by thestarting open reading frame. In some embodiments this is accomplishedusing directed evolution.

In some embodiments the protein or polypeptide encoded by a heterologousopen reading frame is modified to reduce cleavage by proteases presentin phage host cells. For example, computational algorithms can be usedto identify known protease cleavage sites and the sequence of the openreading frame can be modified using conservative substitutions to removethese sites. Alternatively, directed mutagenesis is used to evolve theopen reading frame sequence to encode a product that has an increasedresistance to at least one protease present in a phage host cell or inthe culture of a phage host cell.

The heterologous open reading frame can also be supercharged to enhanceits stability when expressed in a phage host cell.

In some embodiments the heterologous open reading frame comprises asequence that encodes a polypeptide tag, such that the expressionproduct of the open reading frame comprises the tag fused to apolypeptide or protein encoded by the open reading frame.

2. Selection of Sites for Insertion of Heterologous Nucleic AcidSequences into Phage Genomes

The expression of a heterologous open reading frame inserted into aphage genome will be influenced by many factors, including timing ofexpression in the phage lifecycle, promoter (transcriptional) strength,ribosome binding site (translational) strength, mRNA stability, proteindegradation rates, codon usage, and others. Algorithms can be used toidentify and predict sites within a phage genome that have desiredexpression properties.

Empirical algorithms are based on analysis of proteomics of naturalphage protein expression both for at least one of temporalcharacteristics and absolute expression levels. For example, phageproteins can be tagged and expression levels monitored over time and/orunder different conditions. Phage proteins exhibiting desirableexpression traits are identified. In some embodiments the phage proteinis expressed at a relatively high level. In some embodiments the phageprotein is expressed over a relatively long period of the phagelifecycle. In some embodiments the phage protein is a structuralproteins such as a capsid component. Once a phage protein exhibiting adesirable expression trait is identified a heterologous nucleic acidsequence comprising an open reading frame is inserted into the phagegenome to either replace the open reading frame encoding the identifiedprotein or to place the open reading frame within the heterologousnucleic acid sequence downstream of the open reading frame of theprotein exhibiting a desirable expression trait.

Computational algorithms are used to identify phage promoters withinphage genomic sequences. One such algorithm is provided in Lavigne etal., Bioinformatics, Vol. 20, No. 5, pp. 629-635 (2004). Promoters thatexhibit sequence homology to well-known promoters are particularlyuseful because it can be predicted that such promoters are likely toexhibit desirable functional characteristics. Ribosomal binding site(RBS) strength of endogenous phage genomic sequences can be estimatedusing the RBS Calculator available athttps://salis.psu.edu/software/(hereby incorporated herein byreference). RBS sequences predicted to have high efficiency areparticularly be useful.

DNA sequence homology can also be used to identify open reading frameswhich are known to be expressed at high levels in otherwell-characterized phages (for example open reading frames of T7, T3,T4, and lambda phage). In some embodiments the heterologous nucleic acidsequence replaces such an open reading frame or is placed downstream ofsuch an open reading frame. Lack of DNA sequence homology can be used toidentify open reading frames that are non-essential and are more likelyto tolerate insertions.

Many phages have similar genomic structures. Based on these genomicstructures, sequence comparisons between a subject phage and awell-characterized phage is used to identify locations for insertion ofthe heterologous nucleic acid sequence into a subject phage. Forexample, there are early, middle, and late genes in T7-like phages whichcorrespond to the temporal sequence in which they are expressed andcorrelated to position in the genome. Accordingly, homologous locationswithin a subject phage can be identified and a heterologous nucleic acidsequence inserted into an identified position.

Microarray experiments can identify which genes are turned on in early,middle and late stages of expression with little other information aboutthe phage other than sequence. This is a quick method for getting adetailed expression profile of a novel phage.

The methods and vectors disclosed herein also make it feasible to testin parallel several different insertions into a phage genomeexperimentally. In some embodiments a plurality of insertion sites aretested to empirically identify insertion sites from which heterologousopen reading frames are expressed with desirable characteristics. Insome embodiments the insertion sites are random. In some embodiments theinsertion sites are at predetermined locations. In some embodiments thetested insertion sites are a combination of at least one randominsertion site and at least one predetermined insertion site.

In some embodiments a phage comprises a plurality of insertedheterologous nucleic acid sequences located at different sites withinthe phage genome. In some embodiments the inserted sequences are thesame. In some embodiments the plurality of inserted heterologoussequences comprises at least two different heterologous sequences. Insome embodiments the inserted heterologous sequences comprise openreading frames that are expressed at different levels at differentstages of the phage lifecycle.

Phage lysis is a competing factor for expression of heterologous openreading frames inserted into a phage genome. If a phage kills a hostcell too early, then open reading frame expression may not reach adesired level. The phage lifecycle can be altered to enhanceheterologous open reading frame expression. For example, expression oflysis proteins (such as lysins and holins) can be reduced by alteringtheir ribosome binding sequences to thereby extend the phage lifecycleand delay lysis. In some embodiments this process is used to increase atleast one of total heterologous open reading frame expression during aphage lifecycle and maximum heterologous open reading frame expressionduring a phage lifecycle.

3. Insertion of Heterologous Sequences into Phage Genomes

Cloning of phage genomes in vectors that allow propagation in cells thatare not phage-host cells, as demonstrated herein, enables application ofseveral methods known in the art to insert heterologous nucleic acidsequences into the cloned phage genome present in the recombinantvector. The heterologous nucleic acid sequence may be inserted in vivoin a vector host cell (e.g., a yeast cell) or in vitro using arecombinant vector isolated from a vector host cell.

Random Via Transposon Hopping.

In one method, random delivery of a known piece of DNA via transposonhopping is used to deliver a heterologous nucleic acid sequence torandom sites in a cloned phage genome. In some embodiments transposoninsertion occurs in vivo. In some embodiments transposon insertionoccurs in vitro. In some embodiments the transposon is used to deliveran open reading frame encoding a selectable marker to a site in thephage genome. The engineered phage genome may be further modified tocomprise “handle” site comprising recognition sites for endonucleases inorder to facilitate further genetic modification at the site.

Transposon delivery may provide random sampling of all the sites in thephage genome. After delivery of a transposon to a particular site in thephage genome, the resulting recombinant phage may be tested forviability (their ability to form phage particles) and optionally for atleast one additional phage phenotype. In this way phage genomescomprising an inserted heterologous DNA may be screened to identifythose having desirable characteristics. If the recombinant phage alreadycarries a selectable marker this test simultaneously assays for theinsertion site tolerating genetic change and also for the phage and theinsertion site tolerating the size of inserted heterologous nucleicacid. Any insertion events that are tolerated are selected for, takingforward as sites for optional future genetic modification and transgenedelivery.

Homologous Recombination

Homologous recombination may be used to insert a linear cassette into acloned phage genome. In some embodiments the linear cassette comprisesan open reading frame that encodes a selectable marker. In someembodiments the selectable marker confers at least one phenotype on avector host cell comprising the phage genome selected from a selectablephenotype and a screenable phenotype. In such embodiments the selectableor screenable phenotype may be used to identify vector host cells thatcomprise a recombinant vector comprising the heterologous nucleic acidsequence. In some embodiments the heterologous nucleic acid sequencecomprises an open reading frame that encodes a gene product thatexpresses a protein of interest in a phage host cell comprising a phagegenome comprising the open reading frame. In some embodiments theselectable marker gene product and the gene product that expresses aprotein of interest in a phage host cell comprising a phage genomecomprising the open reading frame are the same. However, in severalembodiments the selectable marker gene product and the gene product thatexpresses a protein of interest in a phage host cell comprising a phagegenome comprising the open reading frame are different. In suchembodiments the heterologous nucleic acid sequence comprises at leasttwo open reading frames, a first open reading frame encoding theselectable marker and a second open reading frame encoding a geneproduct that expresses a protein of interest in a phage host cellcomprising a phage genome comprising the open reading frame.

In some embodiments the recombinant phage genome is created in a YAC ina form comprising both first and second open reading frames. In someembodiments that recombinant phage genome is transferred to a phage hostcell, as described below, such that the phage genome introduced into thephage host cell comprises both the first and second open reading frames.In some embodiments the first open reading frame that encodes theselectable marker that confers at least one phenotype on a vector hostcell comprising the phage genome selected from a selectable phenotypeand a screenable phenotype is removed from the recombinant phage genomebefore the recombinant phage genome is transferred to a phage host cell.For example, the open reading frame encoding the selectable marker maybe removed from the recombinant phage genome using homologousrecombination in yeast cells. Alternative methods such as Cre-loxPmediated recombination may also be used.

Homologous recombination in yeast is accomplished by creating aheterologous nucleic acid sequence comprising ends that are homologousto target sites in a cloned phage genome. If the heterologous nucleicacid sequence comprises an open reading frame encoding a selectablemarker then insertion of the linear cassette into the phage-YAC may beselected for by plating on selective media (for example, media lackinguracil if the marker is URA3). The resulting phage-YACs will thuscontain cassettes that comprise the selectable marker and thus theheterologous nucleic acid sequence. If the heterologous nucleic acidsequence comprises a second open reading frame that encodes a productthat is not used for selection in yeast then this single selection alsoidentifies recombinant phage-YACs comprising this second open readingframe.

In some cases, removal of the selectable marker and extraneous sequencesof the cassette are desirable. This may be achieved by engineering shortdirect repeats within the cassette; these direct repeats can be targetedby host recombination machinery resulting in the excision of theintervening DNA and selected for under appropriate culture conditions.An example of this strategy is shown in FIG. 3. The cassette structureis shown in FIG. 3A. From left to right, the cassette contains sequenceelements A-Luc-B-URA3-C-Luc*-D, where Luc* is the 3′ terminal end of theLuc gene and is thus homologous to the Luc gene located in between A andB. URA3 is a selectable marker (any suitable marker may be substitutedfor URA3). Insertion of this cassette by vector host cell recombinationmachinery into the phage genome is shown in FIG. 3B. This figure showsthe general strategy used at the T3 0.7 and 4.3 genes as described inthe Examples (labeled T3_0.7/4.3 in FIG. 3B). Following recombinationthe locus will have the structure shown in FIG. 3C. Vector host cellscomprising the recombined vector may be selected on growth media lackinguracil.

Upon removal of selective pressure for the presence of the URA3 gene thehost cells will recombine the homologous region shared by Luc and Luc*,resulting in phage-YACs which contain A-Luc-D only. If the selectablemarker used is selectable and counterselectable (which URA3 for exampleis), then following selection of cells comprising theA-Luc-B-URA3-C-Luc*-D insertion using selection (for example, mediawithout uracil), cells which have lost the selectable marker and thusare A-Luc-D through interenal recombination (FIGS. 3D and 3E) may becounterselected (for example, by growth in media with 5′FOA when D isthe URA gene). Variants of this strategy may be performed such that thescar DNA sequence remaining after the recombination is any arbitrarysequence.

4. Creating Phage Particles from Cloned Phage Genomes

Cloned phage genomes, whether genetically modified or not, may be usedto create phage particles. If the cloned phage genome is a recombinantgenome comprising a heterologous nucleic acid sequence the resultantphage particles will also be recombinant and in this way capable oftransferring the recombinant heterologous sequence to phage host cells,which in turn may result in expression of a recombinant gene productencoded by the heterologous nucleic acid sequence in the phage hostcells.

Choosing the method for converting engineered phage DNA constructs intoviable phage particles is based on one or more of a variety of factors.For example, size limitations for bacterial host transformation mayrestrict the efficiency of direct transformation of engineered phage DNAconstructs into host bacteria. The availability of highly competentstrains for transformation as surrogate hosts may enable efficientdelivery of phage DNA constructs into these surrogates prior toamplification on other susceptible hosts. In some embodiments theability of bacterial types to perform homologous recombination onsmaller DNA fragments to assemble longer DNA fragments allows for thetransformation of smaller engineered phage DNA fragments into hostsfollowed by in-cell assembly back into functional phage genomes.

Direct Transformation.

The examples herein demonstrate transformation of engineered phagegenomes directly as phage-YAC DNA into an appropriate host cell. Thesephage-YACs replicate, excise and package into infectious phage particlescapable of repeated infection.

In this method, engineered YACs are recovered from yeast transformantscomprising the YACs. In some embodiments this is accomplished bydisrupting the yeast transformant by glass bead lysis thereby releasingthe YACs from the transformed cells. The released YACs bearing phage areelectroporated into an appropriate phage host cell and plated in astandard plaque assay. The inventors have produced plaques from atransformation of YACs bearing phage genomes. To date this has beensuccessfully accomplished using E. coli phages (T3 and T7) andSalmonella phage (FelixO1). These results demonstrate production offunctional phage from cloned phage genomes.

Liberation of Phage DNA, Followed by Direct Transformation.

Not all phages will tolerate the presence of foreign DNA at a terminus.To mitigate this, linearization of vectors to remove the exogenous DNAand liberate phage genomic DNA is used to improve transformationefficiency. To that end, in some embodiments cloning vectors designed toallow flush cutting of the vector to liberate phage DNA thatrecapitulates the original phage genome are used. In some embodimentsthe cloning vectors are created to comprise meganuclease recognitionsites for this purpose. Further protection of ends by incubating thisDNA with phage extracts, for example, allows protection of the ends toimprove transformation efficiency.

Circularization.

Some phage genomes require a circularized state to produce viable phageparticles in host bacteria. Accordingly, in some embodiments plasmidscomprising a phage genome surrounded by recombinase recognition sitesare used. Upon expression of the recombinase, either in bacteria, yeast,or in vitro, the phage genome is circularized, creating a genomestructure that supports production of viable phages.

Alternatively, phage genomes are excised from vectors using restrictionenzymes to digest DNA at or near their ends and then circularized usingDNA ligase.

Surrogate Transformation.

Phage host-range is often determined by the presence or absence ofreceptors on the surface of the cell. Closely related organisms that uselargely the same replication, transcription and translation machinerymay actually be cross-resistant to different phages due to externalcell-surface factors. In addition, some bacterial hosts are easier totransform than others. In view of this, genetically tractable, relatedbacterial strains may be used to make phage bursts from engineered phageDNA constructs. Accordingly, in some embodiments, the cloned phagegenomic DNA is transformed into a surrogate strain, recovered after aperiod of time, and then the phage lysate is exposed to a sensitive hostfor propagation of the lysate into a higher titer lysate. In this waysurrogate transformation (also called trans-transformation) allowsrecovery of phages from hosts that are otherwise un-transformable.

For example, an engineered Salmonella phage DNA construct may betransformed into E. coli efficiently due to its high transformationefficiency, the resulting lysate collected and used to infect Salmonellahost cells for subsequent phage propagation. This was done forSalmonella phage of Felix01. An infectious lysate was obtained aftergrow out of culture that had been electroporated with phage-YAC DNA intoE. coli.

This method may be used with gram-negative surrogates and gram-negativehosts, gram-negative surrogates and gram-positive hosts, gram-positivesurrogates and gram-positive hosts, and gram-positive surrogates andgram-negative hosts.

Surrogate Transformation Followed by Conjugation.

An alternate to transformation of engineered phage DNA into a surrogatehost bacteria followed by bursting and amplification on a differentsusceptible host strain (“Surrogate transformation” as described above),is the transformation of engineered phage DNA into a surrogate hostbacteria followed by conjugation of the engineered phage DNA constructinto a different susceptible host strain. This method is useful forengineering phages which have difficult-to-transform hosts. For example,a gram-positive bacterial host may be difficult to directly transformwith an engineered phage DNA construct. In this case, the phage DNAconstruct in a vector that contains conjugation machinery is transformedinto a surrogate bacterial strain (such as E. coli) which is thencapable of conjugating the phage DNA construct into a differentsusceptible host strain (such as the gram-positive host of the phage).

5. Verifying Engineered Phages

Recombinant phage made or derived from a cloned phage genome may becharacterized in a number of ways. The genome structure of such phagemay be characterized using PCR screening, restriction digestion,sequencing, or a combination thereof. For example, primers that flankthe desired insertion site of the heterologous nucleic acid sequence inthe phage genome may be designed and used to identify the presence ofthe heterologous nucleic acid sequence based on successful PCRamplification of the fragment. qPCR primers can also be used to detectthe presence of genetic changes such as insertions, deletions, orsubstititions. Purified phage genomic DNA from viable phage particlescan be purified and subjected to restriction digestion and analysis toconfirm genomic structure. Direct sequencing may also be used to providea high resolution of genome structure.

Phenotypic screening may also be used to characterize recombinant phageparticles. In some embodiments recombinant phage and libraries ofrecombinant phage are screened to identify phenotypes of interest. Insome embodiments phenotypic screening is used directly as an assay forrecombinant phage of interest. For example, screening biofilm removal orbacterial detection.

In some embodiments enzyme assays for the expression products of theheterologous nucleic acid sequences present in the recombinant phagegive a good indication of optimal phage properties. For example, phageswith high levels of luciferase expression or high levels of xyalanaseexpression to remove xylans from biofilm matrix.

In some embodiments competition experiments identify phages that carryproperties of interest, optionally including selected growthcharacteristics. Mixing phages together, and recovering the dominantphages at the end of a mixed infection is used in some embodiments toidentify phages that carry a combination of properties of interest.

D. Methods of Making Collections of Engineered Phages and Collections ofEngineered Phages

The methods disclosed herein allow for high throughput generation ofdiverse collections of recombinant phage. The collections may bedesigned to include at least one of a plurality of different startingphage genomes, a plurality of inserted heterologous nucleic acidsequences, and a plurality of different insertions sites of theheterologous nucleic acid sequences into a starting phage genome.

In some embodiments the plurality of recombinant vectors comprises aplurality of different heterologous nucleic acid sequences. Theheterologous nucleic acid sequences may differ in one or more ways. Forexample, the heterologous nucleic acid sequences may comprise differentopen reading frames that include different products. Alternatively or inaddition the heterologous nucleic acid sequences may comprise differentexpression control sequences that direct expression of an open readingframe in a different manner, such as at a different maximum level ofexpression or in a different temporal profile during a phage infectionlifecycle. For example, the expression control sequences may differ inpromoter or ribosome binding site. The heterologous nucleic acidsequences may also differ in length or nucleotide composition. In someembodiments the plurality of heterologous insertion sequences consist ofsequences that each differ from every other sequence by at least 1%, atlast 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least7%, at least 8%, at least 9%, at least 10%, at least 15%, at least 20%,at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, orat least 50% at the nucleotide level. In some embodiments the pluralityof heterologous insertion sequences consist of sequences that compriseopen reading frames, and the open reading frames each differ from everyother open reading frame sequence by at least 1%, at last 2%, at least3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, atleast 9%, at least 10%, at least 15%, at least 20%, at least 25%, atleast 30%, at least 35%, at least 40%, at least 45%, or at least 50% atthe nucleotide level. In some embodiments the plurality of heterologousinsertion sequences consist of sequences that comprise open readingframes, and the open reading frames encode products that each differfrom every other open reading frame encoded product by at least 1%, atlast 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least7%, at least 8%, at least 9%, at least 10%, at least 15%, at least 20%,at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, orat least 50% at the amino acid level.

In some embodiments the plurality of recombinant vectors comprises aplurality of different heterologous nucleic acid sequences and at least5 different heterologous nucleic acid sequences are present in theplurality of recombinant vectors. In some embodiments at least 10, atleast 15, at least 20, at least 25, at least 30, at least 35, at least40, at least 45, at least 50, at least 60, at least 70, at least 75, atleast 80, at least 85, at least 90, at least 95, at least 100, at least200, at least 300, at least 400, or at least 500 different heterologousnucleic acid sequences are present in the plurality of recombinant phagevectors.

In some embodiments the plurality of recombinant vectors comprises atleast two types of recombinant phage genomes, in which the heterologousnucleic acid sequence is inserted at different locations. In someembodiments the recombinant phage genomes present in the plurality ofvectors are based on the same starting phage genome. Thus, in suchembodiments the heterologous sequence is inserted at different sites inthe same phage genome. In other embodiments the recombinant phagegenomes present in the plurality of vectors are based on at least twodifferent starting phage genomes.

In some embodiments the plurality of recombinant phage genomes comprisesat least 5 types of recombinant phage genomes, in which the heterologousnucleic acid sequence is inserted at different locations. In someembodiments the plurality of recombinant phage genomes comprises atleast 10, at least 15, at least 20, at least 25, at least 30, at least35, at least 40, at least 45, at least 50, at least 60, at least 70, atleast 75, at least 80, at least 85, at least 90, at least 95, at least100, at least 200, at least 300, at least 400, or at least 500 types ofrecombinant phage genomes, in which the heterologous nucleic acidsequence is inserted at different locations.

In some embodiments the plurality of recombinant vectors comprises acommon first open reading frame and a plurality of different second openreading frames, and at least 5 different second open reading frames arepresent in the plurality of recombinant vectors. In some embodiments atleast 10, at least 15, at least 20, at least 25, at least 30, at least35, at least 40, at least 45, at least 50, at least 60, at least 70, atleast 75, at least 80, at least 85, at least 90, at least 95, at least100, at least 200, at least 300, at least 400, or at least 500 differentsecond open reading frames are present in the plurality of recombinantphage vectors

Collections of recombinant phage genomes and/or recombinant phagecomprising the recombinant genomes are also provided. The collectionsinclude recombinant phage genomes and phages with recombinant genomesthat include at least one starting phage genome, at least oneheterologous insertion sequence, and at least one site of insertion ofthe at least one heterologous insertion sequence in the at least onestarting genome. In some embodiments the collection includes at least 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, or1000 different types of starting phage genome. In some embodiments thecollection includes at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50,60, 70, 80, 90, 100, 500, or 1000 different types of heterologousinsertion sequence. In some embodiments the collection includes at least2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, or1000 different sites of insertion of the at least one heterologousinsertion sequence in the at least one starting genome. Thus, in someembodiments of the collection a single heterologous insertion sequenceis inserted at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70,80, 90, 100, 500, or 1000 different sites in the same starting phagegenome. In other embodiments more than one heterologous insertionsequence is present in the collection and/or more than one startingphage genome is present, and there are at least 2, 3, 4, 5, 6, 7, 8, 9,10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, or 1000 different sites ofinsertion of the heterologous nucleic acid sequence into phage genomespresent in the collection.

In some embodiments the collection of recombinant phage genomes are notpackaged into phage particles. For example, in some embodiments thecollection of recombinant phage genomes are present in vectors, such asYACs. In some embodiments the vectors are stored in isolated or purifiedform. In other embodiments the vectors are present in vector host cells,such as yeast, which can be in any form such as a frozen glycerol stockor growing on solid or liquid media.

In some embodiments the collection of recombinant phage genomes arepackaged into phage particles.

In some embodiments all or substantially all members of the collectionare present together in a mixture, such as a liquid culture thatcontains phage particles or a liquid culture that contains a library ofdifferent yeast cells. In other embodiments all or substantially allmembers of the collection are stored isolated from one and other, suchas in different cultures or as different frozen glycerol stocks.

In some embodiments a collection of phage or phage chromosomes isscreened to identify a subset of the collection that shares one or morefeatures. For example, if the collection comprises phage genomes fromdifferent starting phage the collection may be screened to identifymembers of the collection that are capable of infecting a particulartype or combination of types of bacteria. Alternatively, the collectionmay be screened to identify members of the collection that expressheterologous open reading frame products above a certain level.

EXAMPLES

The following examples serve to more fully describe the manner of usingthe invention. These examples are presented for illustrative purposesand should not serve to limit the true scope of the invention.

Example 1 Cloning and Genetically Modifying Phage T3

A. Phage Capture

Phage T3 was cloned and manipulated in the following manner. T3 wasgrown using E. coli DH10B as a host, grown in Luria Broth (LB)+2 mMcalcium chloride. The phage lysate was concentrated via incubation with10% polyethylene glycol-8000 overnight at 4° C., followed bycentrifugation. The pellet was resuspended in SM buffer (Sambrook etal., Molecular Cloning: A Laboratory Manual, 3d ed., Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y. (2001)). DNA was preparedfrom the concentrated T3 lysate using the Norgen Phage DNA kit(Cat#46700). The genomic sequence of T3 (NCBI accession #NC_003298) wasused to design oligos to capture T3 into the pYES1L vector(Invitrogen®). Oligos used were duplexes of:

[SEQ ID NO: 1] CCTAGTGTACCAGTATGATAGTACATCTCTATGTGTCCCTCCTCGCCGCAGTTAATTAAAGTCAGTGAGCGAGGAAGCGCand its complement, and duplexes of:

[SEQ ID NO: 2] GAACGACCGAGCGCAGCGGCGGCCGCGCTGATACCGCCGCTCTCATAGTTCAAGAACCCAAAGTACCCCCCCCATAGCCCand its complement.

The oligos were transformed into competent MaV203 yeast cells(Invitrogen®) together with purified T3 DNA and yeast artificialchromosome pYES1L Transformed cells were plated on synthetic completemedia without tryptophan, selecting for the TRP marker on pYES1L.Colonies that grew on synthetic complete trp-minus were screened by PCRto show successful capture of the T3 genome.

B. YAC to Plaque

Selected MaV203 cells that contained the pYES1L-T3 phage-YAC were grownup and g lass-bead lysates were prepared (Invitrogen® High-Order GeneticAssembly kit) and electroporated into TOP10 E. coli. The transformationswere mixed with LB+2 mM calcium chloride top agar, and plated on an LB+2mM calcium chloride agar plate. Incubations overnight revealed plaques,corresponding to the captured phage. Captured phages typically yielded1×10² to 1×10⁴ plaques per transformation.

C. Luciferase Insertion into Cloned T3 Phage

Expression cassettes were designed for insertion into differentlocations of the T3 genome. The cassettes contain an intact luciferaseopen reading frame inserted to take the place of an endogenous T3 genesuch that luciferase expression is driven by the endogenous T3 promoter,followed by the URA3 gene with its own promoter, and optionally a directrepeat of the 3′ end of the luciferase gene. Insertions were made intothe T3 0.7 and 4.3 genes. In T3::0.7 luc a cassette containingluciferase and URA3 is swapped into the T3 0.7 gene. In T3::0.7DRluc acassette containing luciferase, URA3, and a direct repeat of the 3′ endof the luciferase gene is swapped into the T3 0.7 gene. In T3::4.3DRluca cassette containing luciferase, URA3, and a direct repeat of the 3′end of the luciferase gene is swapped into the T3 4.3 gene. InT3::0.7IceuILuc a cassette containing luciferase, URA3, and a ICeu Ihoming endonuclease site is swapped into the T3 0.7 gene.

For insertion, the cassettes were amplified as two or three PCRproducts, one containing the luciferase and flanking homology to a firstsite in the phage, the second containing the URA3 gene with flankinghomology to the other two PCR products, and the third containing afragment of luciferase, and homology to a different site on the phagechromosome. The constructs were designed to replace the targeted genewithout deleting other adjacent sequences. The internal fragmentcontaining URA3 was amplified using primers:

[SEQ ID NO: 3] CCTCATAAAGGCCAAGAAGGGCGGAAAGTCCAAATTGTAAACGGATTCACCACTCCAAGA and [SEQ ID NO: 4]ATAATCATAGGTCCTCTGACACATAATTCGCCTCTCTGATTCAACGACAG GAGCACGATC.

The 3′ end of the full luciferase fragment was amplified by:

[SEQ ID NO: 5] AAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTTACAATTTGGACTTTCCGC.

The 5′ end of the shorter luciferase fragment was amplified by:

[SEQ ID NO: 6] TCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAATCAGAGAGGCGAATTATGT.

For inserting this duplication cassette at the T3 0.7 gene, the 5′ endof the full luciferase fragment was amplified with:

[SEQ ID NO: 7] AATTTACTCTTTACTCTTACAGATAACAGGACACTGAACGATGGAAGACGCCAAAAACAT,and the 3′ end of the shorter luciferase fragment with:

[SEQ ID NO: 8] ATTCAGGCCACCTCATGATGACCTGTAAGAAAAGACTCTATTACAATTTGGACTTTCCGC.

For insertions at the 4.3 gene site, the 5′ end of the full luciferasefragment was amplified with:

[SEQ ID NO: 9] CTCACTAACGGGAACAACCTCAAACCATAGGAGACACATCATGGAAGACGCCAAAAACAT,and the 3′ end of the shorter luciferase fragment with

[SEQ ID NO: 10] TGTTTGCGTGCTTGATTGATTTACTCATGTTGTGCTCCTATTACAATTTGGACTTTCCGC.

In each case (0.7 and 4.3 gene sites), 3 PCR products were created, andco-transformed into yeast containing the T3-YAC described above.Recombination was selected by growing cells in the absence of uracil.Colonies that grew in the absence of uracil were screened by PCR forpresence of the cassette. Colonies positive by PCR were subjected to theYAC-to-plaque technique (described above) to recover viable phages.These phages were subsequently screened by PCR to confirm the presenceof the cassette.

D. Expression of Luciferase in Recombinant Phage

An overnight culture of E. coli cells was diluted 1/100 and grown intomid-log phase in LB+1 mM calcium chloride (approximately 2 and a halfhours). Cells were diluted and infected with a vast excess of phages(1×10⁷ phages per infection) in a total of 100 ul. Infections wereallowed to proceed, non-shaking at 37 degrees C. After 90 minutes, 100ul of Promega® Steady-Glo luciferase detection reagent was added to 20uL of infection, and infections were immediately read on a Promega®GloMax 20/20. Cells infected with the different engineered phage showedsome variation of expression levels, but cells infected with T3::0.7Luc,T3::DRLuc, T3::4.3DRLuc, and T3::0.7IceuILuc all expressed detectablelevels of luciferase.

Example 2 Cloning and Genetically Modifying Phage T7

A. Phage Capture

T7 luc was created in a slightly different manner than the engineered T3phage of Example 1.

T7 dspB (T. K. Lu and J. J. Collins, “Dispersing Biofilms withEngineered Enzymatic Bacteriophage,” Proceedings of the National Academyof Sciences, vol. 104, no. 27, pp. 11197-11202, Jul. 3, 2007,incorporated herein by reference) was captured in pYES1L by transforminggenomic DNA of T7 dspB, YAC pYES1L, a duplex of:

[SEQ ID NO: 11] TTGTCTTTGGGTGTTACCTTGAGTGTCTCTCTGTGTCCCTCCTCGCCGCAGTTAATTAAAGTCAGTGAGCGAGGAAGCGCand its complement,and a duplex of:

[SEQ ID NO: 12] CCCGAACGACCGAGCGCAGCGGCGGCCGCGCTGATACCGCCGCCGCCGGCGTCTCACAGTGTACGGACCTAAAGTTCCCCCATAGGGGGTand its complement,into MaV203 yeast cells (Invitrogen®). Those oligonucleotides bridge theends of the T7 genomic sequence (NC_001604) and the YAC vector.

B. YAC to Plaque

Cloned T7 phages were shown to be able to YAC-to-plaque, as above.

Selected MaV203 cells that contained the pYES1L-T7 dspB phage-YAC weregrown up and glass-bead lysates were prepared (Invitrogen® High-OrderGenetic Assembly kit) and electroporated into TOP10 E. coli. Thetransformations were plated and overnight incubations revealed plaques,corresponding to the captured phage.

C. Luciferase Insertion into Cloned T7 Phage

The T7-dspB YAC was purified by glass-bead lysate, and cut with EcoRIand HindIII. Luciferase was amplified with the primers

[SEQ ID NO: 13] TAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGGAAGACGCCAAAAACAT and [SEQ ID NO: 14]CCAAGGGGTTAACTAGTTACTCGAGTGCGGCCGCAAGCTTTTACAATTTG GACTTTCCGC.

Duplexed

[SEQ ID NO: 15] ACATTTTCTGGCGTCAGTCCACCAGCTAACATAAAATGTAAGCTTTCGGGGCTCTCTTGCCTTCCAACCCAGTCAGAAATand its complement was also used to repair the HindIII cut YAC backbone.The cut phage-YAC, luciferase PCR product and duplexed repair oligoswere co-transformed into MaV203 yeast cells (Invitrogen®), and selectedon media lacking tryptophan, resulting in a single TRP+ colony.Engineered phage-YAC were confirmed by PCR and converted into phageparticles via the YAC-to-plaque technique, as described above.

D. Expression of Luciferase in E. coli Infected With Recombinant Phage

An overnight culture of E. coli cells was diluted 1/100 and grown intomid-log phase in LB+1 mM calcium chloride (approximately 2 and a halfhours). Cells were diluted and infected with a vast excess of phages(1×10⁷ phages per infection) in a total of 100 ul. Infections wereallowed to proceed, non-shaking at 37 degrees C. After 90 minutes, 100ul of Promega® Steady-Glo luciferase detection reagent was added to 20uL of infection, and infections were immediately read on a Promega®GloMax 20/20. Cells infected with the T7::Luc phage expressed ofdetectable levels of luciferase.

Example 3 Cloning and Genetically Modifying Phage T3

Phage T3 was captured into the pYES1L vector (Invitrogen®) and shown tobe functional in the YAC to plaque assay as described in Example 1.

A. Luciferase and Nanoluc Insertion into T3 Phage

The T3 luciferase cassette was constructed as in Example 1.

Promega® vector pNL1.1 was the template for amplification of the nanolucORF with primers JHONO319 and JHONO320. pRS426 was used as a templatefor the Ura3 gene with primers JHNO321 and JHONO322. The sequences ofthose primers are:

JHONO319 [SEQ ID NO: 16]AATTTACTCTTTACTCTTACAGATAACAGGACACTGAACGATGGTCTTCA CACTCGAAGA JHONO320[SEQ ID NO: 17] TTACGCCAGAATGCGTTCGCAC JHONO321 [SEQ ID NO: 18]AGTGACCGGCTGGCGGCTGTGCGAACGCATTCTGGCGTAACAGATTGTAC TGAGAGTGCACC JHONO322[SEQ ID NO: 19] ATTCAGGCCACCTCATGATGACCTGTAAGAAAAGACTCTACACACCGCATAGGGTAATAACTG.

In each case (luc cassette and nanoluc cassette), 2 PCR products werecreated, and co-transformed into yeast containing the T3-YAC describedabove in Example 1. Recombination was selected by growing cells in theabsence of uracil. Colonies that grew in the absence of uracil werescreened by PCR for presence of the cassette. Colonies positive by PCRwere subjected to the YAC-to-plaque technique (described above) torecover viable phages. These phages were subsequently screened by PCR toconfirm the presence of the cassette. Note that for these cassettes theURA3 gene was not excised.

B. Expression of Luciferase and Nanoluc in E. coli Infected withRecombinant Phage

Replacement of the T3 0.7 gene with the luc and nano luc cassettesallowed for quantitative comparison of the two open reading frames. Thetiter of luciferase expression phage was determined and a dilutionseries of NEB-10b cells was then infected with the same number ofinfective bacteriophage. This strategy allows for a direct comparison ofthe activity of the luc and nanoluc open reading frames. FIG. 4 reportsthe results of this experiment as relative luminescence units/number ofinfective phage (RLU/PFU). This data shows that the nanoluc ORF producesa higher ratio of RLU/PFU than the luc cassette.

Example 4 Cloning and Genetically Modifying Felix Phage

A. Phage Capture

Felix was grown using Salmonella LT2 as a host, grown in LB+2 mM CaCl2_.A phage lysate was prepared and concentrated via NaCl/PEGprecipitation/cesium chloride gradient The genomic sequence of Felix wasused to design capture oligos to capture Felix into the pYES1L vector(Invitrogen®). Oligos used were duplexes of

DBONO184 [SEQ ID NO: 20]GAGTTCAACTTCTTTGGAGACATCTCAAGCACAGATTACAGATCCACTAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCTand its complimentand

DBONO185 [SEQ ID NO: 21]AGCGCGCGTAATACGACTCACTATAGGGCGAATTGGGTACATGACACCTGAAATGTTCAGCCTTCTGAGTTCTGGTGTATand its complement.

The oligos were transformed into competent MaV203 yeast cells(Invitrogen®) together with purified Felix DNA and yeast artificialchromosome pRS414. Transformed cells were plated on synthetic completemedia without tryptophan, selecting for the TRP marker on pYES1L.Colonies that grew on synthetic complete trp-minus media were screenedby PCR and DNA sequencing to show successful capture of the Felixgenome.

B. YAC to Plaque

Strains bearing Felix01_Phage_YACs were unable to support phageproduction in Salmonella enterica serovar Typhimirium LT2 cells (ATCC19585) using the standard YAC to plaque assay described in the precedingexamples. However, electroporation of the Felix01_phage_YAC into NEB-10bcells generated a lysate that contained infectious Felix01 bacteriophagethat were then used to form plaques in an infection of Salmonellaenterica serovar Typhimirium LT2 (ATCC 19585). This process has beencalled surrogate transformation and in this case allowed for derivationof cloned engineered Felix01 phage capable of infecting host Salmonellaenterica serovar Typhimirium LT2 (ATCC 19585) cells.

C. Luciferase Insertion into Cloned Felix Phage

Expression cassettes were designed for insertion into differentlocations of the Felix genome. For insertion, the cassettes wereamplified as three PCR products, one containing the luciferase andflanking homology to a first site in the phage, the second containingthe URA3 gene with flanking homology to the other two PCR products, andthe third containing a fragment of luciferase, and homology to adifferent site on the phage chromosome. The internal fragment containingURA3 was amplified using primers:

[SEQ ID NO: 3] CCTCATAAAGGCCAAGAAGGGCGGAAAGTCCAAATTGTAAACGGATTCACCACTCCAAGA and [SEQ ID NO: 4]ATAATCATAGGTCCTCTGACACATAATTCGCCTCTCTGATTCAACGACAG GAGCACGATC.

The 3′ end of the full luciferase fragment was amplified by:

[SEQ ID NO: 5] AAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTTACAATTTGGACTTTCCGC.

The 5′ end of the shorter luciferase fragment was amplified by:

[SEQ ID NO: 6] TCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAATCAGAGAGGCGAATTATGT.

The 5′ end of luciferase and the 3′ end of the truncated luciferase(luc*) contain sequences specific for the targeted locus forintegration. In the case of Felix 01 (NCBI accession NC_005282)integration gene cassettes were made to target the following loci of theFelix01 genome: GP37, ORF51, ORF83, ORF19, ORF23, ORF46, ORF83. ForORF51, ORF83, ORF19, ORF23, ORF46, and ORF83 the cassette replaced theendogenous open reading frame. GP37 is a tail fiber gene and for it theinsertion was at a downstream location and included an introducedShine-Dalgarno sequence upstream of luciferase.

The locus-specific oligonucleotides used to amplify the 5′ luciferase(F) and 3′ luciferase* (R) are:

GP37 [SEQ ID NO: 22] (F)-TTCTATAAGCTGATGGCTTGGGTAAGAACTGCTTAATCCCAGGAAACAGGATCCAAATGGAAGACGCCAAAAACAT [SEQ ID NO: 23](R)-CATAAAGAATATTAACACCATCTTAACAATCAGTCAATAATTACAA TTTGGACTTTCCGC ORF51[SEQ ID NO: 24] (F)-TTTTAAGGGGAAACGAGATTTATTATTTGGAGAAAACATAATGGAAGACGCCAAAAACAT [SEQ ID NO: 25](R)-TAACAGCATTTAAGTCCATTAAGCGCCTCCGCAAATAGAATTACAA TTTGGACTTTCCGC ORF83[SEQ ID NO: 26] (F)-GATGACATCAAGTGTCTGTTCCCATAATAGGTGATTAACTATGGAAGACGCCAAAAACAT [SEQ ID NO: 27](R)-TAGGTGTTCCATCAGACTCATAGCAGTGTTCAATTTTCATTTACAA TTTGGACTTTCCGC ORF19[SEQ ID NO: 28] (F)-GGTTTTTAGATAGATTAAATTACACATCAACGGGGAGGGAATGGAAGACGCCAAAAACAT [SEQ ID NO: 29](R)-GGGCTTACTTTACAGACTTTTAAGCCCCATGTAAAGCACTTTACAA TTTGGACTTTCCGC ORF23[SEQ ID NO: 30] (F)-CTCCCCACTAAATAAAACCCTTAAACTAGGAGATTCTAAAATGGAAGACGCCAAAAACAT [SEQ ID NO: 31](R)-CTGTTAGGGTATCTGGGGCTATTTAGCCCCGCTGCGTCGATTACAA TTTGGACTTTCCGC ORF46[SEQ ID NO: 32] (F)-GCCAAACTGTCTTGAAAACAGTTGCCACTGTAGAGATACGATGGAAGACGCCAAAAACAT [SEQ ID NO: 33](R)-ACAACAAGCGGTAATAACCTTAGAAGCCCTCTAAAAAGACTTACAA TTTGGACTTTCCGC ORF83[SEQ ID NO: 34] (F)-GATGACATCAAGTGTCTGTTCCCATAATAGGTGATTAACTATGGAAGACGCCAAAAACAT [SEQ ID NO: 35](R)-TAGGTGTTCCATCAGACTCATAGCAGTGTTCAATTTTCATTTACAA TTTGGACTTTCCGC.

Recombinants for luciferase cassette integrations at the target lociwere confirmed with junctional PCR spanning the recombinant junctions.Sequencing of those PCR products revealed the desired integrations hadoccurred.

Surrogate transformations of the engineered Felix 01 phage into NEB-10bcells were attempted as described previously. Many of thesetransformations resulted in plaques, of the starting wild-type Felix 01phage. PCR primer combinations that could amplify either a recombinantYAC or a wild-type Felix01 YAC detected both products in many clones.This result suggested the presence of a heterogeneous population ofcells. Streaking of cells on ura-minus, leu-minus plates yielded singlecolonies of mixed genotype. As an alternative strategy, genomic DNA wasisolated from these cells and re-transformed into yeast (haploid anddiploid). No re-transformants were obtained.

Without wishing to be bound by any particular theory, these data suggestthe possibility that there may be an extra, wild-type phage YAC presentin the cells that will not segregate away under selection. This couldoccur, for example, if the diploid host cells maintain multiple copiesof the plasmid.

Another possible explanation is that the increase in genome sizeresulting from adding ˜3 kb to the phage genome causes problems in phageDNA packaging. Some phages are unable to tolerate increases in genomesize this large and Felix 01 may be such a phage. In that regard it isnoteworthy that the engineering platform developed herein allows forquick and easy testing of the tolerance of any phage to the addition ofDNA to its genome. The high throughput enabled by this disclosure allowsfor screening of large numbers of phage in parallel and selection ofthose with any desired property or properties. That approach may be usedto select one or a set of phage amenable to engineering.

With respect to engineering Felix 01, one option is to use a haploidstrain to capture Felix 01. If the diploid genome of the strain that wasused in the impediment then this will allow isolation of pure engineeredphage_YACs, leading to engineered phages via surrogate transformation.

If the genome size is the impediment, an alternative strategy is toremove portions of the Felix 01 genome that are not necessary for phagereplication and thereby reduce the net addition of DNA to the Felix 01genome.

Example 4 Cloning A511 Phage

A511 is a phage that specifically infects Listeria cells. The A511genome (NC_009811) is 137,619 nucleotides long and characterized by a3125 bp terminal repeat. The A511 genome was captured using YAC pRS415,linearized with BamHI and XhoI and treated with NEBNext end repairmodule (New England Biolabs).

For capture of the A511 phage genome two different stitchingoligonucleotide strategies were used (See FIG. 2). In the first, 80 bpdouble stranded stitching oligos bridging the ends of the phage genomeand the YAC insertion sites were used. The first stitching oligo was

DBONO192 [SEQ ID NO: 36]AAATAAAAAAAAAATAAAACCAAAACCTGTAAAGCGCCCCGATCCACTAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCTand its complement.

The second stitching oligo was

DBONO199 [SEQ ID NO: 37]TACGACTCACTATAGGGCGAATTGGGTACCGGGCCCCCCCAGCATTTTTTTCACACGGTGTCAACTCAAAAGGCTTATATand its complement.

In the second strategy stitching oligos of approximately 600 bases wereconstructed using a crossover PCR approach. Building the 600 bpfragments by PCR is a 2-step process. In the first step, the end regionsof the phage and linearized vector are amplified. For example, DBONO186and DBONO189 amplify the end of the A511 genome. DBONO189 adds 20 bp ofhomology to one linearized end of pRS415. DBONO188 and 187 amplify thatend of pRS415, with DBONO188 adding 20 bp of homology to the end ofA511. These PCR products were generated, purified using a QIAGEN PCRpurification kit, and then diluted 1:10. 1 μl of each of these dilutedproducts was used as template for the crossover PCR to generate thefirst 600 bp fragment.

The second 600 bp fragment was generated in a similar fashion. DBONO197and DBONO194 amplify the other end of the A511 genome. DBONO197 adds 20bp of homology to one linearized end of pRS415. DBONO193 and 198 amplifythat end of pRS415, with DBONO198 adding 20 bp of homology to the end ofA511. These PCR products were generated, purified using a QIAGEN PCRpurification kit, and then diluted 1:10. 1 μl of each of these dilutedproducts was used as template for the crossover PCR to generate thesecond 600 bp fragment.

The oligonucleotides used for PCR to generate the 600 bp fragments are:

DBONO186 [SEQ ID NO: 38] GGTACCTTCGAGGCTAGCGG; DBONO187 [SEQ ID NO: 39]GCGCGTTGGCCGATTCATTA; DBONO188 [SEQ ID NO: 40]CAAAACCTGTAAAGCGCCCCGATCCACTAGTTCTAGAGCG; DBONO189 [SEQ ID NO: 41]CGCTCTAGAACTAGTGGATCGGGGCGCTTTACAGGTTTTG; DBONO193 [SEQ ID NO: 42]TAGGGCGCTGGCAAGTGTAG; DBONO194 [SEQ ID NO: 43]TCTTCTTTTTCATAAGATGCCTACACC; DBONO197 [SEQ ID NO: 44]ATTGGGTACCGGGCCCCCCCAGCATTTTTTTCACACGGTG; and DBONO198 [SEQ ID NO: 45]CACCGTGTGAAAAAAATGCTGGGGGGGCCCGGTACCCAAT.

Cotransforming yeast cells with linear pRS415, phage A511 genomic DNApurified as described above, and either the pair of 80 bp stitchingoligos or the pair of 600 bp fragments was used to attempt to capturethe A511 genome in the YAC. Out of 22 resulting clones analyzed usingthe 80 bp stitching oligos none contained the A511 genome. In contrast,in two experiments using different pRS415 DNA preps, 5 of 48 and 23 of47 clones were found to contain the A511 genome. PCR was used to confirmthe presence of intact termini of the A511 genome in the A511-YACs.

Informal Sequence Listing:

The following nucleotide sequences are referenced in this application:

Sequence ID Number Sequence 1CCTAGTGTACCAGTATGATAGTACATCTCTATGTGTCCCTCCTCGCCGCAGTTAATTAAAGTCAGTGAGCGAGGAAGCGC 2GAACGACCGAGCGCAGCGGCGGCCGCGCTGATACCGCCGCTCTCATAGTTCAAGAACCCAAAGTACCCCCCCCATAGCCC 3CCTCATAAAGGCCAAGAAGGGCGGAAAGTCCAAATTGTAAACGGATTCACCA CTCCAAGA 4ATAATCATAGGTCCTCTGACACATAATTCGCCTCTCTGATTCAACGACAGGAG CACGATC 5AAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTTACAATTTGGAC TTTCCGC 6TCCTGGCCACGGGTGCGCATGATCGTGCTCCTGTCGTTGAATCAGAGAGGC GAATTATGT 7AATTTACTCTTTACTCTTACAGATAACAGGACACTGAACGATGGAAGACGCCA AAAACAT 8ATTCAGGCCACCTCATGATGACCTGTAAGAAAAGACTCTATTACAATTTGGAC TTTCCGC 9CTCACTAACGGGAACAACCTCAAACCATAGGAGACACATCATGGAAGACGCC AAAAACAT 10TGTTTGCGTGCTTGATTGATTTACTCATGTTGTGCTCCTATTACAATTTGGACT TTCCGC 11TTGTCTTTGGGTGTTACCTTGAGTGTCTCTCTGTGTCCCTCCTCGCCGCAGTTAATTAAAGTCAGTGAGCGAGGAAGCGC 12CCCGAACGACCGAGCGCAGCGGCGGCCGCGCTGATACCGCCGCCGCCGGCGTCTCACAGTGTACGGACCTAAAGTTCCCCCATAGGGGGT 13TAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGGAAGACGCCAA AAACAT 14CCAAGGGGTTAACTAGTTACTCGAGTGCGGCCGCAAGCTTTTACAATTTGGA CTTTCCGC 15ACATTTTCTGGCGTCAGTCCACCAGCTAACATAAAATGTAAGCTTTCGGGGCTCTCTTGCCTTCCAACCCAGTCAGAAAT 16AATTTACTCTTTACTCTTACAGATAACAGGACACTGAACGATGGTCTTCACACT CGAAGA 17TTACGCCAGAATGCGTTCGCAC 18AGTGACCGGCTGGCGGCTGTGCGAACGCATTCTGGCGTAACAGATTGTACTG AGAGTGCACC 19ATTCAGGCCACCTCATGATGACCTGTAAGAAAAGACTCTACACACCGCATAGG GTAATAACTG 20GAGTTCAACTTCTTTGGAGACATCTCAAGCACAGATTACAGATCCACTAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCT 21AGCGCGCGTAATACGACTCACTATAGGGCGAATTGGGTACATGACACCTGAAATGTTCAGCCTTCTGAGTTCTGGTGTAT 22TTCTATAAGCTGATGGCTTGGGTAAGAACTGCTTAATCCCAGGAAACAGGATCCAAATGGAAGACGCCAAAAACAT 23CATAAAGAATATTAACACCATCTTAACAATCAGTCAATAATTACAATTTGGACTT TCCGC 24TTTTAAGGGGAAACGAGATTTATTATTTGGAGAAAACATAATGGAAGACGCCA AAAACAT 25TAACAGCATTTAAGTCCATTAAGCGCCTCCGCAAATAGAATTACAATTTGGACT TTCCGC 26GATGACATCAAGTGTCTGTTCCCATAATAGGTGATTAACTATGGAAGACGCCA AAAACAT 27TAGGTGTTCCATCAGACTCATAGCAGTGTTCAATTTTCATTTACAATTTGGACT TTCCGC 28GGTTTTTAGATAGATTAAATTACACATCAACGGGGAGGGAATGGAAGACGCCA AAAACAT 29GGGCTTACTTTACAGACTTTTAAGCCCCATGTAAAGCACTTTACAATTTGGACT TTCCGC 30CTCCCCACTAAATAAAACCCTTAAACTAGGAGATTCTAAAATGGAAGACGCCA AAAACAT 31CTGTTAGGGTATCTGGGGCTATTTAGCCCCGCTGCGTCGATTACAATTTGGAC TTTCCGC 32GCCAAACTGTCTTGAAAACAGTTGCCACTGTAGAGATACGATGGAAGACGCC AAAAACAT 33ACAACAAGCGGTAATAACCTTAGAAGCCCTCTAAAAAGACTTACAATTTGGAC TTTCCGC 34GATGACATCAAGTGTCTGTTCCCATAATAGGTGATTAACTATGGAAGACGCCA AAAACAT 35TAGGTGTTCCATCAGACTCATAGCAGTGTTCAATTTTCATTTACAATTTGGACT TTCCGC 36AAATAAAAAAAAAATAAAACCAAAACCTGTAAAGCGCCCCGATCCACTAGTTCTAGAGCGGCCGCCACCGCGGTGGAGCT 37TACGACTCACTATAGGGCGAATTGGGTACCGGGCCCCCCCAGCATTTTTTTCACACGGTGTCAACTCAAAAGGCTTATAT 38 GGTACCTTCGAGGCTAGCGG 39GCGCGTTGGCCGATTCATTA 40 CAAAACCTGTAAAGCGCCCCGATCCACTAGTTCTAGAGCG 41CGCTCTAGAACTAGTGGATCGGGGCGCTTTACAGGTTTTG 42 TAGGGCGCTGGCAAGTGTAG 43TCTTCTTTTTCATAAGATGCCTACACC 44 ATTGGGTACCGGGCCCCCCCAGCATTTTTTTCACACGGTG45 CACCGTGTGAAAAAAATGCTGGGGGGGCCCGGTACCCAAT

While the present invention has been described with reference to thespecific embodiments thereof, it should be understood by those skilledin the art that various changes may be made and equivalents may besubstituted without departing from the true spirit and scope of theinvention. In addition, many modifications may be made to adapt aparticular situation, material, composition of matter, process, processstep or steps, to the objective, spirit and scope of the presentinvention. All such modifications are intended to be within the scope ofthe claims appended hereto.

What is claimed is:
 1. A method of making a cloned phage genome,comprising: providing a vector; inserting a starting phage genome intothe vector to provide a recombinant vector; and propagating therecombinant vector in a vector host cell that is not a phage host cellto thereby provide the cloned phage genome.
 2. A method of making arecombinant phage genome, comprising: providing vector host cellscomprising a recombinant vector comprising a cloned phage genome;inserting a heterologous nucleic acid sequence into the starting phagegenome to provide a recombinant phage genome; and selecting vector hostcells comprising the recombinant vector comprising the recombinant phagegenome to thereby provide the cloned phage genome.
 3. A method of makinga recombinant phage genome, comprising: providing a yeast artificialchromosome comprising a cloned phage genome; and inserting aheterologous nucleic acid sequence into the cloned phage genome toprovide a recombinant phage genome.
 4. A method of making a phage,comprising: providing a yeast artificial chromosome comprising a phagegenome; transforming the yeast artificial chromosome into competentphage host cells; and isolating phage particles comprising the phagegenome produced by the transformed phage host cells.